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ABSTRACT 


The  development  of  large-scale  integrated  circuits  in 
the  last  few  years  has  resulted  in  a  rapid  increase  in  the 
number  of  digital  devices  available  for  electronic  system 
design.  Skilled  system  designers  often  do  not  have  an  abun- 
dance of  software  experience  and  require  better  tools  than 
are  presently  availaole  in  order  to  take  maximum  advantage 
of  microprocessors  and  other  functional  building  blocks.  A 
case  is  made  for  the  development  of  a  high-level  language 
compiler  which  will  allow  the  designer  to  specify  not  only 
his  algorithm  Out  also  his  hardware  configuration  and  his 
optimization  constraints.  The  PL/M  compiler  developed  by 
Intel  Corporation  is  used  as  a  model  for  examining  some  of 
the  requirements  of  this  "machine-independent"  compiler.  A 
summary  of  work  which  was  done  to  implement  the  first  stages 
of  such  a  compiler  is  presented*  and  factors  which  must  be 
considered  in  order  to  further  this  work  are  discussed. 
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I.   INTRODUCTION 


A.   PROBLEM  DEFINITION 

The  most  promising  and  widely  discussed  device  in  the 
electronics  industry  during  the  last  few  years  has  been  the 
mi c roorocessor .  Tnis  device  packages  the  central  processing 
unit  and  associated  elements  of  a  digital  computer  into  a 
handful  of  integrated  circuit  chips;  in  many  cases  only  one 
chip  is  used.  Since  the  advent  of  the  microprocessor  in 
1971  it  has  become  much  easier  to  incorporate  the  power  of  a 
digital  computer  into  the  design  of  an  electronic  system. 
Compared  with  custom  Large  Scale  Integration  (LSI)  circuits 
microprocessors  are  convenient/  flexible*  low-cost  devices 
which  have  allowed  sophisticated  features  to  be  made  avail- 
able in  relatively  simple  systems.  As  a  result  their  use 
has  expanded  rapidly/  and  many  people  who  have  had  limited 
programming  experience  are  now  being  forced  to  write  pro- 
grams as  part  of  their  design  efforts. 

The  term  "firmware"  has  come  to  be  used  for  systems 
which  utilize  programmable  digital  components/  since  the 
develoDment  of  such  systems  reguires  both  hardware  and 
software  design.  The  design  of  a  firmware  system/  whether 
it  uses  a  microprocessor  or  some  other  means  of  providing  a 
programming  caoaoility/  is  a  complex  task  requiring  the  best 
skills  of  both  tne  electronics  engineer  and  the  computer 
programme  r . 


Another  development  which/  although  conceived  in  the 
early  1950*S#  has  become  significant  only  in  the  last  decade 
is  the  use  of  microprogramming  in  digital  systems.  Mi- 
croprogramming differs  from  "normal"  programming  primarily 
in  the  level  of  detail  considered.  Each  instruction  in  the 
instruction  set  of  a  typical  digital  computer  reguires 
several  hardware  operations  to  be  performed/  but  these 
operations  are  transparent  to  the  programmer.  In  micropro- 
grammable  systems  each  of  these  primitive  operations  may  be 
invoked  by  a  microinstruction.  Initially/  as  in  the  IBM 
System/360/  microprogramming  was  done  only  by  the  manufac- 
turer/ but  today  there  are  general  purpose  computers  (e.g./ 
the  Hewlett-Packard  HP-2100  and  Burroughs  "D"  machine)  which 
allow  user  microprogramming.  In  addition  there  are  propo- 
sals for  using  standard  functional  modules  in  the  implemen- 
tation of  special  purpose  digital  systems  156].  These  modu- 
lar systems  will  be  controlled  by  what  amount  to  micropro- 
grams . 

As  in  the  case  of  the  microorocessor/  microprogrammed 
systems  will  in  most  instances  be  programmed  by  engineers 
who  have  a  firm  background  in  hardware  aesign  but  who  may 
have  minimal  software  experience.  Thus  it  is  becoming  in- 
creasingly necessary  that  programming  languages  be  Developed 
which  are  easy  to  use  and  which  can  produce  good  control 
code  for  a  variety  of  architectures.  The  compiler  for  such 
a  language  could  be  considered  a  software  comouter  aided 
design   (CAD)   tool   for   the   enoineer.    Ideally/  it  would 
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accept  a  description  of  the  algorithm  to  be  performed  (the 
program)  and  descriptions  of  the  hardware  and  the  format  of 
the  control  code?  the  output  would  then  be  a  control  program 
to  perform  the  algorithm. 

Succeeding  chapters  of  this  thesis  examine  some  of  the 
considerations  necessary  in  the  development  of  a  programming 
language  for  firmware  system  design.  Chapter  II  contains  a 
discussion  of  orogramming  languages  and  the  advantages  and 
disadvantages  of  high-level  languages.  The  language  PL/M  is 
presented  in  Chapter  III  and  is  used  as  a  basis  for  examin- 
ing the  necessary  features  of  a  high-level  language.  The 
implementation  of  pass  1  of  a  PL/M  compiler  is  described  in 
Chapter  IV.  This  chapter  also  describes  some  of  the 
theoretical  aspects  of  programming  language  design  and  im- 
plementation. The  output  of  pass  1  is  an  intermediate 
language  representation  of  a  source  language  program;  an 
important  concept  which  is  discussed  in  Chapter  V. 

A  major  factor  in  the  implementation  of  any  digital  sys- 
tem is  the  system  architecture.  Chapter  VI  contains 
descriptions  of  various  types  of  architectures  and  a  discus- 
sion of  the  influence  of  architecture  on  language  design. 
Optimization  of  the  outnut  code  is  another  important  con- 
sideration in  the  design  of  a  compiler.  Many  firmware  sys- 
tems will  be  produced  in  large  numbers*  and  the  amount  of 
hardware  used  will  have  a  significant  impact  on  the  cost, 
because  of  the  fierce  competition  among  manufacturers*  aood 
optimization    techniques    will    be    critical    in     the 
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i mo  1 ement at i on   of  these  systems.   Chaoter  VII  is  devoted  to 
the  tooic  of  compiler  optimization. 

The  -topics  discussed  in  Chapters  I V- V 1 1  are  tied  togeth- 
er in  Chaoter  VIIIj  which  shows  how  they  all  influence  the 
desiqn  of  a  compiler  for  user-definable  architectures. 
Chapter  IX  summarizes  the  conclusions  reached  during  the 
study  of  the  oroblem  and  presents  a  list  of  recommendations 
for  future  work . 

B.   SOFTWARE  ENGINEERING 

With  the  rapidly  growing  use  of  digital  techniques  in 
electronic  system  design  has  come  the  emergence  of  a  new 
discipline*  that  of  software  engineering/  to  address  prob- 
lems at  the  hardware-software  interface.  Although  digital 
computers  have  been  in  existence  for  more  than  30  years;  it 
is  only  today  becoming  widely  recognized  that  the  software 
design  considerations  are  at  least  as  important  as  the 
hardware  design  considerations  in  digital  system  development 
111J.  Tne  acceptance  of  the  fact  that  software  problems  are 
of  more  than  academic  interest  is  highliqhted  by  the  recent- 
ly inaugurated  publication  of  a  new  technical  journal--the 
IEEE  Transactions  on  Software  Engineering.  Because  software 
engineering  addresses  many  issues  which  are  very  closely 
related  to  the  firmware  design  problem/  its  goals  and 
pr i nc i p 1 es--as  defined  by  Ross,  Goodenouqh*  and  Irvine 
[51J--are  outlined  below.  These  ideas  have  a  strong  influ- 
ence on  much  of  the  remainoer  of  this  thesis. 
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The  goals  of  software  engineering  are: 

1)  Modifiability-~This  refers  to  the  ability  to  make  con- 
trolled changes  in  a  program.  In  a  large  system 
software  modifications  nave  to  be  made  during  develop- 
ment as  well  as  after  production  has  begun.  Modifica- 
tions may  be  made  either  to  correct  errors  or  to  change 
or  add  features  and  provide  varying  levels  of  perfor- 
mance (i.e.*  a  "family"  of  systems). 

2 )  Efficiency--This  goal  is  concerned  with  the  best  utili- 
zation of  the  resources  available.  Typically  this 
means  using  the  least  memory  and  time  in  performing  the 
task.  Efficiency  is  usually  "...  prematurely  permitteo 
a  high  priority  in  engineering  tradeoffs  ...  (but)  does 
not  dominate  the  practice  of  software  engineering." 
151,  p. 20-21] 

3)  Re  1 i abi 1 i t y--T h i s  is  a  critical  goal,  especially  for 
software  systems  used  in  real-time  control  applica- 
tions. Unfortunately  reliability  has  too  often  in  the 
past  been  considered  as  secondary  to  efficiency  in 
software  development. 

4)  Unde rs t andab i 1 i t y--Th i s  goal  supports  the  goals  of 
modi f i ab i 1 i t y  and  reliability.  If  a  piece  of  software 
is  easily  understandable*  it  is  easy  to  modify  and  easy 
to  check  for  reliability.  It  is  unfortunate  that  un- 
ders t andab i 1 i t y  is  usually  considered  to  reduce  effi- 
ciency/ but  this  relationship  does  not  necessarily 
hold.  Increased  underst andab i 1 i t y  can  lead  to  the 
detection  of  inefficiencies  in  a  large  system. 
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There  are  seven  principles  of  software  engineering  which 
may  be  applied  in  order  to  achieve  the  goals.  These  princi- 
ples are: 

1)  Modu1arity--This  refers  to  the  purposeful  structurina 
of  a  system.  Modularity  is  an  important  principle  in 
both  hardware  and  software  design. 

2)  Abst rac t i on  — 1  he  unessential  details  are  omitted  at  any 
given  level  in  the  desiqn,  leaving  only  abstract  con- 
cepts for  consideration. 

3)  Loca 1 i zat i on--L i mi t i ng  the  scope  of  a  structure  or  a 
concept  is  closely  related  to  modularity.  Localization 
enhances  conf i rmab i 1 i t y  and  unde rst andab i 1 i t y . 

4  )  Hiding--" ...  ITJhe  purpose  of  hiding  is  to  make  inac- 
cesible  certain  details  that  should  not  affect  other 
parts  of  a  system."  [51/  p. 22] 

5)  Uni f orm i t y--I t  is  important  that  definitions  and  con- 
cepts be  applied  uniformly  across  a  system. 

6)  Completeness--Soecifying  all  details  and  leaving  noth- 
ing to  chance  greatly  increases  reliability. 

7)  Conf i rmabi 1 i t y--T h i s  refers  to  the  ability  to  determine 
whether  all  the  design  goals  have  been  met. 

Software  engineering  is  concerned  with  the  question  of 
whether  it  is  more  important  to  have  very  efficient  coder  in 
the  sense  that  it  uses  the  minimum  amount  of  memory  and  exe- 
cutes at  the  maximum  rate  (two  goals  which/  oy  the  way/  are 
usually  not  compatible)/  or  whether  it  is  more  important  to 
have   code   which   is   reliable/  easy  to  modify/  and  takes  a 
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minimum  amount  of  time  to  develop.  As  in  all  engineering 
disciplines,  software  engineering  is  involved  with  making 
tradeoffs  among  the  various  al ternatives*  since  no  one 
answer  will  be  correct  in  all  situations. 
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II.   PROGRAMMING  LANGUAGES 

In  the  early  days  of  digital  computer  use  it  became  evi- 
dent that  an  alternative  to  machine  language  programming  was 
needed.  A  computer  program  is  really  nothing  more  than  a 
series  of  binary  digits  contained  in  some  storage  medium, 
Put  human  engineering  dictated  that  early  machine  language 
programs  be  represented  as  groups  of  octal  digits.  With  the 
introduction  of  mnemonics  and  assemblers  to  translate  them, 
programs  oecame  almost  readable.  Assemblers  became  more  and 
more  sophisticated  with  the  addition  of  macros,  comment 
fields,  and  conditional  assembly  features,  but  programs 
still  were  tedious  to  write  and  difficult  to  read.  The 
drawback  of  assembly  language  programs  is  that  they  contain 
too  much  information  about  the  operation  of  the  hardware 
(contrary  to  the  principle  of  abstraction),  and  this  tends 
to  obscure  information  related  to  the  algorithm  being  imple- 
mented. Since  there  is  essentially  a  one-to-one  correspon- 
dence between  assembly  language  instructions  and  machine 
instructions,  assemoly  language  programs  still  tend  to  be 
very  cumbersome  and  error-prone  except  when  used  for  very 
si  mole  problems.  Thus,  as  programs  became  increasingly  com- 
plex, high-level  languages  were  introduced. 


A.   THE  CASE  FOR  HIGH-LEVEL  LANGUAGES 

"I  he   development   of  high-level  languages  was  spurred  by 
the  desire  to  be  able   to   write   programs   which   are   more 
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descriptive  of  the  problems  being  solved  and  which  depend 
less  on  the  actual  hardware  on  which  the  programs  are  to 
execute..  "High-order  languages  represent  a  concept  for 
improving  the  underst andab i 1 i t y  of  programs  by  abstracting 
from  the  details  of  computer  instruction  sets."  1 5 1 r  p.l9J 
These  languages  are  designed  to  facilitate  description  of 
the  procedural  steps  involved  in  problem  solution*  and  thus 
they  are  often  referred  to  as  procedure-oriented  languages 
(as  opposed  to  machine-oriented  assemoly  languages). 

The  main   advantages   of   programming   in   a   high-level 
1 anguaqe  are  I 

1)  The  programmer  is  freed  from  the  consideration  of  many 
minor  details.  These  details  are  mainly  in  the  nature 
of  bookkeepi ng--memory  al location/  register  allocation, 
assignment  of  temporary  variables  to  hold  the  results 
of  partial  comput a t i ons ,  rememoering  branch  locations, 
type  checking  of  variables,  and  many  others.  This  fac- 
tor is  oecoming  even  more  important  with  the  increased 
use  of  m i c roorogrammao 1 e  systems.  "The  inability  of  a 
user  to  cope  with  a  highly  intricate,  t i me-and-mach i ne 
dependent  environment  often  results  in  inefficient,  if 
not  error  prone,  microprograms."  (47,  p. 791) 

2)  Efficient  control  structures  greatly  reduce  the  burden 
of  programming,  resulting  in  increased  reliability. 

3)  Symbolic  user  variables  increase  the  readability  of  the 
program.  This  is  also  one  of  the  advantages  assembly 
languages  have  over  machine  languages. 
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4 )  The  ability  to  write  arbitrary  arithmetic  expressions 
also  increases  readability  and  tends  to  reduce  computa- 
tional errors. 

5)  Programmer  productivity  is  increased  because  of  the 
expansion  factor  involved  in  the  translation  from 
high-level  language  to  machine  language.  It  is  gen- 
erally recognized  that  programmers  produce/  on  average 
in  a  large  project/  only  a  few  lines  of  code  per  day/ 
whether  it  be  machine  coder  assembly  code/  or  high- 
level  language  code. 

6)  Documentation  is  improved/  because  the  program  is  more 
understandable.  A  good  high-level  language  encourages 
the  writing  of  programs  which  are  essentially  self- 
documenting. 

7 )  Maintenance/  modification/  and  debugging  are  facilitat- 
ed because  of  the  improved  readability  and  documenta- 
tion. 

8)  Transportability  is  improved/  because  a  high-level 
language  has  little  dependence  on  a  particular  machine 
architecture.  In  fact  one  goal  of  language  design  is 
complete  machine  independence.  This  topic  is  covered 
more  fully  in  Chapter  VIII. 

By  far  the  most  popular  criticism  of  high-level 
languages  is  based  upon  the  concern  for  efficiency.  There 
are  basically  two  sources  of  inefficiency.  The  first  has  to 
do  with  the  fact  that  in  certain  instances  some  languages 
are    too  machine  independent  in  that  they   do   not   recognize 
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features  which  are  basic  to  computer  hardware.  For  example* 
FORTRAN  does  not  contain  primitive  operations  for  Pit  manip- 
ulation (shifting*  rotatinq*  masking*  etc.).  A  good  high- 
level  language  should  not  restrict  the  programmer  from  doing 
anything  that  he  could  do  at  the  assembly  or  machine 
1 anguage  1  eve  1 s . 

The  second  source  of  inefficiency  lies  within  the  code 
generation  Drocess  and  is  really  a  characteristic  of  the 
compiler  rather  tnan  the  language.  The  complaints  most 
often  voiced  by  those  opposed  to  the  use  of  a  high-level 
language  are  that  the  compiler  generates  too  much  code  and 
that  the  code  generated  is  wasteful  of  time.  However*  the 
point  is  usually  demonstrated  with  only  a  small  program 
[19,43]  . 

These  inefficiencies  are  really  local  in  nature*  since 
each  instance  can  usually  be  isolated  to  a  few  lines  of 
code.  A  gooa  assembly  language  programmer  can  write  locally 
"optimal"  code*  but  in  a  large  program  his  code  will  suffer 
from  global  inefficiencies  (see  advantage  (1)  above).  Thus 
"...  data  based  upon  comparisons  between  small  programs  will 
tend  to  underestimate  the  advantage  of  the  higher  level 
language  for  large  programs."  123*  p. 214]  Many  large  pro- 
grams written  in  high-level  languages  would  have  been  very 
difficult  and  costly  to  write  in  assembly  language  (34)  and 
probably  would  have  been  less  efficient. 

It  is  doubtful  whether  any  compiler  will  ever  be  able  to 
generate   completely   locally  optimal  code  (as  compared  with 
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assembly  language  versions)^  but  there  are  many  promising 
tecnnigues  emerging  (see  Chapter  VII).  Experience  has  shown 
that  global  inefficiency  is  a  nonlinear  function  of  program 
length/  and  a  good  compiler  can  usually  produce  more  effi- 
cient code  than  an  assembly  language  programmer  for  programs 
longer  than  about  50  to  100  high-level  language  statements 
[231.  For  shorter  programs  an  engineering  decision  must  be 
made  as  to  whether  the  advantages  of  programming  in  a  high- 
level  language  outweigh  the  loss  in  efficiency.  A  more  com- 
plete discussion  of  this  topic  is  presented  in  Section 
VII. A.  As  memory  costs  continue  to  fall/  the  extra  code 
generated  by  the  local  inefficiencies  in  a  compiler  will 
take  on  lesser  significance  even  in  small  system  design  pro- 
jects. 

B.   SYSTEM  PROGRAMMING  LANGUAGES 

An  area  very  closely  related  to  firmware  design  is  that 
of  system  programming.  For  many  years  system  programmers 
have  avoided  the  use  of  high-level  languages/  and  for  the 
same  reason  that  the  designers  of  programmable  hardware 
(firmware)  are  now  avoiding  t hem-- i ne f f i enc y .  In  addition 
to  the  fact  that  software  engineering  considerations  are 
causing  this  position  to  be  reevaluated/  many  advances  have 
been  made  in  the  past  few  years  in  the  area  of  programming 
language  design.  The  development  of  good/  machine- 
indenendent/  high-level  languages  for  system  programming  has 
been  studied  for  several  years  (21/40)/  and  a  few  languages 
have  been  implemented. 
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The  UNIX  ©Derating  system  (50) ,  desinned  for  the  popular 
PDP-11  series  of  m i n i comout ers /  is  currently  in  use  at  more 
than  50  .installations  around  the  country  (including  the  Com- 
puter Science  Laboratory  at  the  Naval  Postgraduate  School) 
ana  was  written  almost  entirely  in  the  C  language  (49]  ,  an 
Algol-like  high-level  languaqe.  In  fact/  the  C  compiler 
itself  was  written  in  C.  The  fact  that  an  interactive* 
multi-user  operating  system  as  sophisticated  as  UNIX  can  be 
implemented  satisfactorily  on  a  minicomputer  confirms  the 
viability  of  high-level  language  programming  in  a  situation 
requiring  efficient  machine  code. 

C.   COMPOSITE  LANGUAGES 

One  alternative  solution  to  the  problem  of  choosing 
between  a  high-level  language  and  an  assembly  language  is 
the  composite  language--a  language  which  has  (hopefully)  the 
best  features  of  both.  The  simplest  implementation  is  a 
high-level  language  which  allows  assembly  code  to  be  insert- 
ed into  a  program.  PL/360  is  an  example  of  this  type  of 
language.  The  advantage  of  using  such  a  language  is  claimed 
to  lie  in  in  the  ability  to  make  use  of  the  efficiency  of 
the  assembly  language  while  retaining  the  benefits  of  high- 
level  language  programming. 

Aside  from  the  loss  of  underst andab i 1 i t y  there  are  two 
major  disadvantages  in  using  this  approach.  First  is  the 
loss  of  machine  independence/  which  reduces  transportabili- 
ty. Each  time  the  architecture  of  the  hardware  is  changed 
(e.g./   by   using   a  different  processor  or  by  rearranging  a 
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modular  system)*  the  program  must  be  carefully  examined  for 
instructions  which  need  to  be  chanqed.  The  second  disadvan- 
tage is  the  reduction  in  reliability  brought  about  by  the 
fact  that  the  programmer  is  allowed  access  to  facilities 
which  normally  are  completely  controlled  by  the  compiler. 
This  can  lead  to  conflicts  (e.g./  in  resource  allocation) 
and  may  cause  unexpected  results  and  subtle  side  effects 
which  are  difficult  to  trace. 

These  disadvantages  were  partially  overcome  in  the  im- 
plementation of  the  Language  for  Systems  Development  (LSD) 
[40J.  In  LSD  the  use  of  assembly  language  is  restricted  to 
macros  whose  definitions  are  separate  from  the  program  it- 
self. Except  for  the  fact  that  the  notation  involved  seems 
somewhat  clumsy*  this  approach  orobaoly  comes  very  close  to 
the  ideal  notion  of  a  machine-independent  compiler. 

A  slightly  different  approach  was  taken  by  Popper  [43] 
in  the  implementation  of  S  "1 A  L  *  which  is  in  essence  an  assem- 
bly language  with  some  of  the  structure  of  a  high-level 
language.  A  SMAL  program  equivalent  to  the  example  present- 
ed in  Section  V 1 1  .  A  was  written  by  Popper*  and  it  reguired 
only  four  more  bytes  of  memory  than  the  assembly  language 
version.  Although  the  S  M  A  L  version  is  easier  to  read  than 
the  assembly  language  version*  it  is  more  difficult  to  read 
than  the  PL/M  version. 

Composite  approaches  such  as  Popper's  probably  will  be 
very  beneficial  for  programmers  who  are  designing  small 
microprocessor  systems  but  they  do  not  appear  to  provide  the 
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Dest  long-range  solution  to  the  firmware  design  problem. 
Succeeding  sections  will  make  it  evident  that  compiler 
theory  is  advancing  to  the  point  of  favoring  the  development 
of  high-level  languages  which  do  not  allow  such  highly 
machine-dependent  features  as  are  found  in  composite 
1 anguages . 


23 


III.   THE  PL/M  LANGUAGE 

In  the  effort  to  provide  comprehensive  software  support 
for  its  eight-bit  microprocessors,  Intel  Corporation  was  led 
naturally  in  197i  to  the  development  of  the  high-level 
language  PL/M  [29,531.  Since  then  several  other  micropro- 
cessor manufacturers  have  announced  either  the  availability 
or  the  anticipated  availability  of  PL/M  compilers  (with  pos- 
sibly some  slight  modifications)  for  their  microprocessors. 
The  first  large  scale  application  of  the  language  by  Intelr 
ironically,  was  in  the  development  of  a  sophisticated 
mac ro-assemo 1 er  to  run  on  its  Intellec  8  microcomputer 
developmental  system. 

PL/M  is  derived  from  the  XPL  compiler-writing  language 
[42]  f  which  in  turn  is  a  derivative  of  PL/I.  Thus  PL/M  is 
very  closely  related  to  both  of  these  languages  in  its  syn- 
tax and  semantics.  A  complete  list  of  the  syntactic  produc- 
tions is  given  in  the  Intel  reference  manual  1291,  and  the 
syntax  and  semantics  of  the  C  language  version  used  for  this 
invest iaation  is  given  in  Appendix  ti  (see  file  "m.gram" ). 
It  should  be  noted  that  the  syntax  for  the  C  version  is  not 
written  in  the  standard  BNF  notation  but  rather  in  the  nota- 
tion reguired  by  YACC  (see  Section  IV.B.l). 

There  have  been  many  proposals  over  the  years  for  the 
development  of  machine-independent  programming  languages 
(e.g.,  MPL  1181,  which   is   also   similar   to   XPL).    PL/M, 
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although  not  currently  machine-independent*  has  the  advan- 
tage of  having  been  implemented  and  used  for  practical  sys- 
tem develooment.  Thus  PL/M  was  chosen  as  the  vehicle  for 
examining  some  of  the  considerations  in  the  development  of  a 
mac h i ne- i ndeoendent  high-level  language  for  firmware  system 
design.  The  remainder  of  this  section  is  devoted  to  a  brief 
description  of  the  language  and  a  discussion  of  its  advan- 
tages and  oossiole  shortcomings. 

A.   LANGUAGt  FEATURES 

Lloyd  and  Van  Dam  have  aefined  a  high-level  languaae   to 

be  one  which  has  the  following  features: 

(1)  Sympolic  user  variables  (allocated  by  the  compiler)* 
(.2)    Ability   to  evaluate  arbitrary  arithmetic  or  logical 

exoress i ons  , 
(3)  Flow  of  control  statements  beyond  simple  (condition- 
al and  unconditional)  GOTO,  SKIP,  Branch   and   Link. 
138,  P.537J 

In  his  search  for  a  high-level  programming  language, 
Eckhouse  found  the  need  for  one  that  was  "...  procedural, 
descriptive,  flexible,  and  possibly  mac h i ne- i ndependent . " 
117,  p. 1691.  PL/M  has  all  of  these  features,  including  a 
limited  kind  of  machine-independence.  The  latter  feature  is 
exhibited  in  the  ability  of  PL/M  programs  to  be  compiled  for 
either  the  6006  or  the  8080  microprocessor.  Although  these 
two  devices  are  both  manufactured  by  Intel  and  have  somewhat 
similar  instruction  sets,  they  have  different  architectures 
and  a  significant  difference  in  the  flexibility  and  speed  of 
execution  of  their  instructions.  These  points  will  be  ex- 
plored further  in  Chapters  VII  and  VIII. 
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As  is  its  predecessors/  PL/M  is  a  b 1 oc k-s t rue t urea 
language  with  a  comprehensive  set  of  control  structures. 
"...  lT]he  control  structures  of  sequential  flow/  condition- 
al selection/  and  iteration  are  sufficient  to  implement  any 
algorithm."  160/  p. 35]  Sequential  flow  is  provided  by  the 
simple  statement  and  the  DO-END  group/  while  conditional 
selection  is  accomDlished  by  three  constructs:  IF-THEN  and 
IF-THEN-ELSE  statements  and  DO  CASE  groups.  The  DO  FOR 
group  is  used  for  a  fixed  number  of  iterations/  and  the  DO 
WHILE  group  is  used  for  iterating  until  some  condition  is 
satisfied.  The  G010  statement  is  also  provided,  in  PL/M  for 
use  in  those  rare  circumstances  where  the  use  of  the  other 
control  structures  may  be  somewhat  awkward.  In  recent  years 
consioerations  of  software  engineering  have  aiscouraged 
indiscriminate  use  of  the  GOTO  since  "...  goto-free  program- 
ming forces  programmers  to  make  explicit  the  conditions 
under  which  a  given  statement  is  executed/  and  this  can  help 
ensure  understandability  and  prevent  errors."  [51/  p. 21) 

PL/M  is  relatively  easy  to  learn  and  read  and  has  a  sim- 
ple character  set.  This  latter  factor  may  be  important/ 
since  a  language  intended  for  use  in  a  wide  variety  of 
design  environments  should  not  require  special  character 
sets  such  as  those  of  APL  or  some  of  the  proposed  micropro- 
gramming languages  (e.g./  see  (M7J).  Ease  of  learning  and 
readability  are  important  in  increasing  programmer  produc- 
tivity and  proaram  modi f i abi 1 i t y  and  reliability. 

In  order  to  give  a  more  complete  picture  of  the  features 
of   PL/M/   a   sample   program  129]  is  presented  in  Figure  1. 
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1 . 

2. 
3. 

a. 

5. 

6. 

7. 

8. 

9. 
10. 
1  1. 
12. 
13. 
14. 
lb. 
lb. 
17. 
18. 
19. 
20. 
21. 
23. 
23. 

2a. 

25. 
26. 
27. 
28. 
29. 
30. 
31. 
32. 
33. 
3a. 
35. 
36. 
37. 


2  0  a  8 : 


/*  is  the  origin  of  this  program  */ 


•o1; 


declare  tto  literally  '2'#  cr  literally  '15q'r 
1 f  1 i teral ly  'Oah' , 
true  literally  'l'r  false  literally 

squareroot:  procedure(x)  byte; 
declare  (x»v/z)  address; 
y  =x;  2     -    shr(x+l  ,  1  ) ; 

do  while  y  <>  z; 

y  =  z;  z  =  s  h  r  (  x  /  y  +  y  +  1/  1); 

end; 
return  y ; 
end  squa re  root ; 

prints char:  orocedure(char); 

declare  bitScell  literally  '91', 

(char,i)  byte; 
outout  (tto)  =  o; 
call  time  (bit$cell); 

do  i  =0  to  7; 

output(tto)  =  char;  /*  data  pulses  */ 

char  =  ror (charf 1 ) ; 

cal 1  timeCbi tScel 1 ) ; 

end; 
outout (tto)  =  1 ; 

call  time  (bitScell  +  bitScell); 
/*  automatic  return  is  generated  */ 
end  print&char; 

orintistring:  orocedure(name/ length); 
declare  name  address/ 

( 1 engt h ,  i , c ha r  based  name)  byte; 

do  i  =  0  to  length  -  l; 

cal 1  print ichar(char(i )); 

end ; 
end  printSstring; 


Figure  1.   Sample  PL/M  program 

for  computing  square  roots 

(continued  on  next  page) 
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36. print  i>  n  u  m  b  e  r  :  procedure(number/Dase/chars/  zeroSsuppress) i 
39.     declare  number  address/ 

(base/Chars/  zero$suporess» i  / j )  byte? 
declare  tennp  (lb)  byte? 

if  chars  >  last(temp)  then  chars  =  last (temp); 
do  i  =  1  to  chars; 
j  =  number  mod  base  +  '0'; 
if  j  >  '  9  '  then  j  =  j  +  7 ; 
if  zeroSsuppress  and  1  <>  1  and  number  =  0  then 


40. 
41. 
42. 
43. 
44. 
45. 
46. 
47. 
48. 
49. 
50. 
51. 
52. 


j  =  •  '; 

t emp ( 1 enqt h ( t emp) -i )  =  j; 

number  =  number  /  base? 

end; 
call  pr i nt $s t r i ng ( . t emp+ 1 engt h ( t emp) -c har s /  chars); 
end  or i nt Snumbe r ; 


53. 

54. declare  i  address^ 

55.     crl f  1 i teral ly  'cr( 1 f • , 

heading  data  (crlf/lf/lf/ 

'  table  of  square  roots'/ 

crl  f  /  1  f  , 

1  value   root  value   root  value   root  value   root1/ 

1  va 1 ue   root ' / 

c  r  1  f  /  1  f  )  ; 


56. 

57. 

56. 

59. 

60. 

61  . 

62. 

63. 

64. 

65. 

66. 

67. 

68. 

69. 

70. 

71. 

72. 

73. 

74. 

75. 

76. 

77. declare  monitorSuses  (10)  byte/ 

76 . eof 


/*  silence  tty  and  print  computed  values  */ 

output ( t to)  =  l ; 

do  i  =  1  to  1000; 
i  f  i  mod  5  =  1  then 

do;  i  f  i  mod  250  =  1  then 

cal 1  print*string(. heading/ length(heading)); 
el  se 

call  pri nt Sst ri ng( . (c r / 1 f ) / 2) ; 
end; 
call  pr i nt ^number ( i / 1 0 / 6 / t rue  /*  true  suppresses 

leading  zeroes  */); 
call  print-1>number(square$root(i)/l0/6/  true); 
end; 


Figure  1  (continued).   Sample  PL/M  program 
for  computing  square  roots  (after  (29)) 
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This  program  (as  well  as  most  other  PL/M  and  C  programs 
reproduced  in  this  thesis)  is  written  in  lower  case  charac- 
ters, since  this  is  the  normal  input  mode  for  the  UNIX 
operating  system,  which  was  used  for  all  of  the  work 
described.  In  addition  to  the  features  previously  men- 
tioned/ notice  should  be  taken  of  the  comment  convention  of 
the  language.  Since  comments  can  be  placed  anywhere  within 
a  program  (rather  than  on  separate  lines  as  in  FORTRAN), 
sel f -document  at i on  is  encouraged.  Although  the  "/*  */"  con- 
vention is  a  little  awkward,  it  has  the  advantage  of  setting 
off  comments  and  not  discouraging  short  comments  (as  does 
the  "COMMENT"  convention  in  ALGOL). 

Fipure  2  presents  a  second  sample  PL/M  program  which 
demonstrates  another  significant  feature  of  the  1 anguage-- 
the  nestea  macro-definition  capability.  While  the  macro- 
definition  concept  is  certainly  not  new,  there  are  many 
languages  which  do  not  allow  macros  (most  notably  FORTRAN 
and  ALGOL).  Many  languages  which  do  have  a  macro  capability 
do  not  allow  nesting.  As  can  be  seen,  the  macros  increase 
the  readability  of  the  program,  but  there  is  another, 
perhaps  greater,  advantage  in  using  them.  By  using  macros 
the  programmer  can  specify  certain  items  only  once  in  a  pro- 
gram (e.g.,  vector  sizes  and  input/output  ports)  and  then 
use  the  macro  names  elsewhere  in  the  program  to  refer  to 
those  items.  Later  he  can  modify  his  program  by  merely 
changing  tne  appropriate  macro  definitions.  While  the  ad- 
vantages of  being  able  to  do  this  are  not  as  evident  in   the 
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1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1  1 

12 

13 

14 

15 

16 

17 

18 

19 

ao 

21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 

4a 
a5 

46 
47 
46 
49 
50 


/*  paper  tape  reader  controller  program  */ 
dec  1  are 


forever 
cdat  a 
est  at 

ccom 

CCDS 

rdat  a 

rstat 

rcom 

noreq 

aeon 

oer  r 

oadcos 

ok 

r  rdy 

Clkl 

cl  k0 

drdy 


1  i  \ 

t  era  1 

1  y  ' 

1  i  1 

t  era  1 

1  y  ' 

1  i  1 

t  era  1 

1  y  ' 

1  i 

t  era  1 

1  y  ' 

1  i  1 

t  era  1 

1  y  ' 

1  i 

t  era  1 

1  y  ' 

1  i 

t  e  ra  1 

1  y  ' 

1  i 

t  era  1 

1  y  ' 

1  i 

t  era  1 

l  y  ' 

1  i 

t  era  1 

1  y  ' 

1  i 

t  era  1 

1  y  ' 

1  i 

t  era  1 

1  y  ' 

1  i 

t  era  1 

1  y  ' 

1  i  1 

t  e  ra  1 

1  y  ' 

1  i  1 

t  e  ra  1 

1  y  ' 

1  i 

t  e  ra  1 

1  y  ' 

1  i  1 

t  e  ra  1 

1  y  ' 

while  1  '  , 

output ( 1 ) ' , 

output (2) ' , 

input (2)  '  , 

i  nput (3) ' , 

i  nput ( 1 ) ' , 

i  nput (0) • , 

output (0) ' , 

not  ccom ' , 

ror ( rst  at  ,  1  )  '  , 

10b', 

100b' , 

>  3  and  cps  <  26 ' , 

rstat  '  , 

Id', 

0b'  , 

i  d  ■ ; 


dec  1  are 


do  fore 
est 

/*  wait 
do 
end 

/*  dete 
cps 

i  f 


cps  by t  e  , 

wa  i  t (22 )  byte  initial 

t 250, 200/ 167, 14 3, 125, 111 , 100, 91, 83, 77, 71, 
67,63,59,56,53,50,48,45,43,42,40); 

ve  r ; 
at  =  o; 


end; 
eof 


until  read  request  */ 
while  no  reqj 

; 

rmine  the  characters  per  second  rate  */ 
=  ccps ; 

aeon  and  rrdy  then 
do; 
if  cds  ok  then   /*  we  are  ready  */ 

do;  /*  to  read  characters  */ 

cdat  a  =  rdat  a ; 

rcom  =  clkl; 

rcom  =  c 1 kO ; 

c  st  a t  =  drdy; 

c  s  t  a  t  =  0  ; 

/  *  wait  for  tape  to  move  */ 

call  time(wait(cps  -  4)); 

end;  else  cstat  =  badeps; 
end;  else  cstat  =  oerr; 


Figure  2.   Another  sample  P L / M  program 
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short  program  of  Figure  2.  as  they  would  be  in  a  large  pro- 
gram, it  should  oe  apparent  that  this  will  increase  the 
mod i f i ab i 1 i t y $  and  consequent ly  the  reliability/  of  programs 
written  in  the  language. 

Another/  less  significant/  factor  which  increases  reada- 
bility is  the  inclusion  of  the  separator  M  $ "  in  some  of  the 
long  identifiers  in  the  program  of  Figure  1.  This  character 
is  ignored  by  the  scanner  in  the  PL/M  compiler  when  included 
within  identifiers  ana  numbers. 

Examination  of  the  PL/M  manual  129]  and  the  programs  in 
Figures  1  and  2  will  reveal  that  PL/M  conta.ins  functions 
which  relate  directly  to  the  Intel  8008  and  8060  instruction 
sets.  Thus  PL/M  fits  the  definition  given  by  Lloyd  and  Van 
Dam  for  a  "tailored"  language:  "A  language  whose  features 
are  explicitly  designed  to  coincide  (to  a  large  extent)  with 
the  hardware  capabilities  of  its  object  machine  ...." 
[38/  p . 54  01  Fortunately  this  is  not  as  serious  a  drawback  as 
it  might  seem,  as  evidenced  by  the  fact  that  other  micropro- 
cessor manufacturers  are  now  developing  or  have  developed 
PL/M  compilers  for  their  machines.  All  of  the  functions  in 
PL/M  which  relate  soecifically  to  the  6008  and  the  8080  are 
implemented  as  built-in  functions;  i.e./  they  are  equivalent 
to  procedures  (and  variables/  in  some  cases)  which  are  de- 
clared in  an  encompassing  block  level  hidden  from  the  pro- 
grammer. Lloyd  and  Van  Dam  138]  recoanized  that  this  is  an 
important  concept/  and  the  method  by  which  it  is  implemented 
is  explained  further  in  Chapter  IV. 
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The  built-in  function  approach  is  probably  preferable  in 
firmware  applications  to  the  extensible  language  approach, 
although  this  will  probably  be  a  topic  of  considerable  de- 
bate for  many  years.  An  extensible  language  is  essentially 
one  which  has  a  more  sophisticated  macro  capability  than 
PL/M.  It  allows  the  programmer  to  define  new  instructions 
and  redefine  the  case  instructions  of  the  language.  This 
may  seem  to  be  a  desirable  feature*  but  unfortunately  it 
violates  the  principle  of  uniformity.  Halstead  observed 
that  "...  the  extensible-language  approach  ...  seemed  to 
open  the  door  to  a  dangerous*  undisciplined  proliferation  of 
overlapping  and  even  incompatible  dialects  within  a  single 
installation  ...."  [23*  p. 214] 

B.   POTENTIAL  MODIFICATIONS 

In  order  for  PL/M  to  serve  as  a  useful  genera  1 -pu rpose * 
machine-in dependent  programming  language  for  firmware 
design*  it  will  probably  be  necessary  to  make  some  slight 
modifications.  The  changes  described  below  were  suggested 
by  study  of  other  programming  languages  which  are  similar  in 
structure  to  PL/M,  with  particular  attention  being  paid  to 
the  C  language  [49],  This  language  has  relative  merits  and 
shortcomings  when  compared  with  PL/M,  but  it  is  a  good  sys- 
tem programming  language  which  generates  efficient  machine 
code  for  the  PDP-11  series  of  minicomputers.  Most  of  the 
items  listed  below  are  convenience  features  rather  .than 
necessities.  (Of  course*  a  major  advantage  of  high-level 
languages  is  their  convenience  when  compared   with   assembly 
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languages.)  Many  of  them  were  not  implemented  in  the  origi- 
nal versions  of  PL/Mr  probably  because  they  tend  to  lead  to 
less  efficient  machine  code*  but  most  of  them  would  not  be 
difficult  to  imolement  and  some  might  even  allow  more  effi- 
cient code  to  be  generated.  The  optimization  techniques 
discussed  in  Section  V 1 1 . 6  would  be  of  benefit  for  those 
features  with  apparent  inefficiencies. 

One  major  weakness  of  PL/M  is  its  paucity  of  data  types. 
If  the  language  were  to  be  used  as  the  basis  of  a  firmware 
design  system,  it  would  need  at  least  a  concept  of  floating 
point  variables  in  order  to  be  widely  accepted.  Other 
desiraole  data  types  include  string  and  substring*  bit/  dou- 
ble precision/  and  complex.  It  would  also  be  convenient  to 
have  the  capability  to  define  data  structures.  Implementa- 
tion of  some  of  these  various  data  types  would  probably  sug- 
gest the  need  for  a  few  new  instructions  for  manipulating 
them  efficiently.  For  example/  double  precision  arithmetic 
instructions  and  string  concatenation  instructions  would  be 
useful  . 

For  algorithms  involving  array  arithmetic  it  would  be 
desirable  to  have  the  capability  to  declare  arrays  of  dimen- 
sion greater  than  one.  A  related  feature  is  the  ability  to 
declare  arrays  with  variable  lower  bounds/  as  in  ALGOL  W. 

Recursion  is  another  feature  which  PL/M  lacks;  however/ 
this  may  not  be  significant  for  firmware  design  applica- 
tions. Recursion  allows  compact  expression  of  an  algorithm 
but  is  not  a  necessary  feature  in  a  language/  since  a  recur- 
sive orocedure  may  be  rewritten  as  an   iterative   procedure. 
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Besides*  recursion  usually  sacrifices  execution  efficiency 
for  programming  efficiency*  and  great  care  must  be  taken  in 
writing  recursive  procedures  in  order  to  ensure  that  they  do 
not  "blow  up." 

A  feature  which  would  prove  very  useful/  especially  in 
large  system  development*  is  the  ability  to  link  indepen- 
dently compiled  and  tested  segments  of  a  program.  The 
current  Intel  versions  of  the  PL/M  language  do  not  allow 
this*  since  the  second  oass  of  each  compiler  produces  abso- 
lute machine  code.  The  implementation  of  this  feature  would 
require  the  declaration  of  "glooal"  or  "external"  variables* 
the  redesign  of  pass  2  to  produce  relocatable  object  code* 
and  the  design  of  a  linking  loader. 

As  will  be  discussed  in  Chapter  VI*  the  trends  in  digi- 
tal architecture  development  have  encouraged*  among  other 
features*  inclusion  of  multiple  high-speed  registers  and 
fast  increment/decrement  instructions.  One  way  in  which  to 
allow  the  high-level  language  programmer  to  take  advantage 
of  such  features  is  to  provide  special  constructs  within  the 
language.  For  example*  he  could  be  allowed  to  declare  fre- 
quently referenced  variables'to  be  "register"  variables  in 
order  to  increase  execution  speed  (and  also  produce  a  slight 
saving  in  the  amount  of  main  memory  required).  The  program- 
mer could  also  oe  allowed  to  write  statements  such  as 

i  =  +tj  -  k; 
or 

i  =  j--  -  k; 
in  order  to  take  advantage  of  the   increment   and   decrement 
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instructions.  The  first  statement  above  would  generate  code 
to  increment  "if"  subtract  the  value  of  Hk"  from  the  new 
value  of  " i  r "  and  store  the  result  in  "i";  while  the  second 
statement  would  generate  code  to  subtract  the  value  of  "k" 
from  the  value  of  H  j  * "  store  the  result  in  "  i  r  M  and  then 
dec  rement  " j  .  " 

Both  the  register  declaration  and  increment/decrement 
features  are  available  in  the  C  language.  The 
i nc rement /dec rement  feature  should  be  really  just  a  conveni- 
ence for  the  programmer,  since  the  same  statements  could  be 
written  inCas 

i  =  ( j  =  j  +  1  )  -  k ; 
and 

i  =  j  -  k ;  j  =  j  -  l ; 
and  the  compiler  should  generate  the  same  code  as  for  the 
previous  two  statements  (unfortunately  it  doesn't--see  Sec- 
tion VII.  B.l).  The  register  declaration  in  C  aoes  result  in 
more  efficient  code  being  generated;  however/  there  may  be 
other  ways  to  solve  the  register  allocation  problem.  This 
point  is  discussed  further  in  Section  VII. B. 2. 

Another  potential  change  in  PL/M  would  involve  the  addi- 
tion of  the  conditional  expression.  This  would  enable  the 
stat  ement 

if  a  <  b  then  c  =  a;  else  c  =  b; 
to  be  rewritten  more  concisely  as 

c  =  if  a  <  b  then  a  else  b? 
and  could  be  done  by  merely  adding  a  few  more  productions  to 
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the  grammar  (see  Section  I  V  .  B  .  1  )  .  This  chanae  would  not 
increase  the  efficiency  of  the  generated  code. 

One  final  area  for  potential  modification  involves  the 
CASE  statement  and  will  probably  require  a  great  deal  more 
study  than  some  of  the  changes  suggested  above.  The  CASE 
construct  in  PL/M  can  be  awkward  and  error-prone  in  some 
situationSf  as  illustrated  in  Figure  3(a).  It  should  be 
noted  that  it  would  have  been  very  easy  for  the  programmer 
to  incorrectly  count  the  number  of  semicolons  in  this  sec- 
tion of  code.  Also  he  has  had  to  resort  to  the  much- 
maligned  GOTO  in  order  to  share  code  between  two  of  the 
cases*  and  his  only  control  over  an  out-of-range  value  of 
"c"  is  to  make  a  test  before  entering  the  CASE  group. 

Figure  3(b)  shows  how  the  same  routine  would  be  imple- 
mented if  the  C  languaoe  SWITCH  statement  were  available  in 
PL/M.  In  this  "case"  there  is  no  need  to  count  semicolons. 
The  use  of  the  GOTO  is  avoided?  since  the  cases  may  be  list- 
ed in  any  order*  and  the  BREAK  is  used  to  exit  from  the 
group.  Also  there  is  a  specific  default  case  to  ensure  that 
appropriate  action  is  taken  for  all  values  of  "c." 

Both  the  CASE  and  the  SrtlTCH  constructs  have   advantages 

and   disadvantages  in  comparison  with  one  another.   The  CASE 

statement*  despite  the  drawbacks  noted  above*   will   produce 

more   efficient   code   in   many  situations  and  should  not  be 

discarded  in  favor  of  the  SWITCH.   Ross*  et  al*   highlighted 

one  of  the  tradeoffs  involved  when  they  noted  that 

...  to  ensure  completeness  of  case  statement  control  a 
programmer  should  be  oe  rm i  t  t  ed  by  the  syntax  to  specify 
what  should  happen  when  a  case  statement  variable   is   out 
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if  c  <  'a'  or  c  >  'u'  then  call  err; 


do  case  c 


-  •  ^  * 


a  ■  ; 


/*  case 


*/ 


/*  case  *  1  '  */ 


/*  case 


/*  case 


'p' 


r  i  i 

toqd  =  t  rue; 

r  $  r  t  f  t  i 

do; 

1  =  t  rue ; 
go  to  1 ab 1 

end; 

•  •  • 
i  t  i 

t ogo  =  t  rue ; 

•  •  • 

tit 

togt  =  true; 
labl:  do; 

1  i  -n  =  o; 

do  while  (c  :=  getc)  >=  '0'  and  c  <= 

lim  =  Mm  *  10  +  c  -  '0'; 
end; 
if  1  then  1 i  m 1  =  lim; 
else  limu  =  lim; 
end ; 
end  /*  case  group  */; 


*/ 


*/ 


/*  case  'u'  */ 


9'; 


(a) 


do  sw 
case 
case 
case 


i 

e 
case 
case 
de  f  au 
end; 


i  tc 
'd' 
•  1  ' 


f  1 
1  se 
'p' 
't  ' 
It: 


c; 

to 

1 

1  i 

do  w 

1  i  m 

end; 

the 

1  i  m 

:  to 

:  to 

ca  1 


gd  =  true;  break; 

=  t  rue; 

m  =  o; 

hile  (c  :=  getc)  >=  '0'  and  c  <: 

=  lirr  *  10  +  c  -  '0'; 

n  1 i  m 1  =  lim; 
u  =  lim;  break ; 
gp  =  t  rue;  break ; 
gt  =  true;  break; 
1  err; 


(b) 


Figure  4.   (a)  PL/M  code  using  the  CASE  construct* 
(b)  PL/M  code  usino  the  SWITCH  construct 
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of  range.  Confi rmabi li ty  applied  to  the  same  issue  would 
imply  a  programmer  should  be  regu  i  red  to  state  what  should 
happen.  Of  courser  if  he  knows  that  out  of  range  values 
are  not  possible?  this  too  should  be  expressible*  to  per- 
mit implementation  efficiency.   Ibl*  p.23J 

Vaughn  [S8]  has  suagested  that  both  facilities  could  be  pro- 
vided in  the  same  language  in  the  form  of  a  generalized  IF 
statement.  A  simpler  alternative  would  seem  to  be  to  incor- 
porate the  structure  of  Figure  3(b)  into  the  present  PL/M 
language  along  with  the  CASE  statement. 
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IV.   PASS  1  IMPLEMENTATION 


Aho  and  Ullman  13)  have  suggested  that  the  process  of 
compilation  is  composed  of  seven  subprocesses :  lexical 
analysis?  error  analysis?  oookkeeping?  parsing?  translation 
(to  an  intermediate  form)?  code  optimization?  and  object 
code  generation.  while  it  may  be  difficult  to  identify  all 
seven  of  these  su b processes  in  any  given  compiler  and  their 
order  may  not  be  the  same  as  that  given?  this  is  a  good  con- 
ceptual model.  Figure  4  shows  how  each  of  the  parts  of  this 
model  is  related  to  the  others  [3?  p . 7  4]. 

This  chapter  documents  the  initial  stages  of  the  design 
of  a  compiler  for  user-definable  architectures.  All  but  the 
code  optimization  and  code  generation  phases  of  this  model 
were  implemented.  The  latter  two  phases  are  discussed  in 
Chapters  VII  and  VIII?  and  suggestions  are  given  there  for 
their  implementation.  Recommendations  for  continuation  of 
the  design  are  given  in  Chapter  IX. 

A.   1HE  FORTRAN  VERSION 

In  order  to  gain  insight  into  the  analysis  of  some  of 
the  problems  presented  in  other  sections  of  this  thesis?  it 
was  felt  that  some  practical  experience  in  compiler  imple- 
mentation was  desirable.  For  reasons  given  in  Chapter  III 
the  PL/M  microprocessor  language  was  chosen  for  this  pur- 
pose?  and   an   attempt  was  made  to  implement  the  commercial 
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Figure  4.   Model  of  a  compiler  (after  1 3 J  ) 
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f-URTRAN  version  on  a  Digital  Equipment  Corporation  PDP-11/50 
computer  with  an  interactive  operating  system.  Unfortunate- 
ly this  proved  unfeasible*  and  attention  was  shifted  to 
writing  another  version  of  pass  1  of  the  compiler  in  a  sys- 
tem programming  language.  The  latter  effort  was  successful/ 
and  a  full  account  is  given  below  in  Section  IV. B. 

The  main  reason  for  the  failure  of  the  FORTRAN  implemen- 
tation was  that  it  required  more  primary  memory  than  was 
available  on  the  PDP-11.  rthen  run  on  the  IBM  S/360  at  the 
Naval  Postgraduate  School/  pass  1  of  the  PL/M  compiler  re- 
qui  res  approximately  120K  bytes  of  memory.  On  the  PDP-11 
only  about  56K  bytes  of  user  memory  were  available/  and  if 
all  of  the  object  modules  could  have  been  linked  and  loaded 
together  it  is  estimated  that  they  would  have  occupied  about 
100-110K  bytes  of  memory. 

An  attempt  was  made  to  divide  the  routines  in  such  a  way 
that  several  sub-passes  could  be  generated/  each  requiring 
less  than  56K  bytes?  however/  there  turned  out  to  be  too 
much  interdependence  among  the  routines/  and  there  was  al- 
ways at  least  one  partition  which  required  more  memory  than 
was  available.  This  was  because  the  synthesis  routine  (the 
one  which  generates  the  intermeoiate  language  code)  reauired 
aoout  50K  Dytes  by  itself/  and  it  required  many  other 
routines  to  be  loaded  with  it. 

Another  problem  which  developed  involved  the  discovery 
that  the  data  initialization  statements  in  the  Intel  PL/M 
compiler   do   not    conform    to    ANSI    standard    FORTRAN 
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specifications  (although  it  is  claimed  that  the  compiler  is 
written  in  standard  FORTRAN  to  enhance  transportability). 
The  FUR.TRaN  compiler  used  for  this  project  accepts  only 
programs  written  in  standard  FORTRAN.  It  requires  each 
variable  initialized  in  a  DATA  statement  to  be  named  indi- 
vidually/ and  this  caused  problems  with  the  vast  number  of 
vectors  which  are  initialized  in  the  bLOCK  DATA  routine. 
This  by  itself  was  not  a  critical  problem  and  could  have 
been  overcome  without  too  much  difficulty  if  there  had  been 
justification  to  continue  working  with  the  FORTRAN  version. 

After  the  first  attempts  to  partition  pass  1  of  the  com- 
piler failed^  there  were  two  alternatives  available.  Either 
a  more  concerted  effort  could  have  been  made  to  subdivide 
the  FORTRAN  version,  or  a  completely  new  version  could  have 
been  attemoted  in  a  more  efficient  language.  After  consid- 
ering the  amount  of  effort  which  would  be  involved  in  work- 
ing with  the  FORTRAN  version  and  the  inherent  inefficiencies 
entailed  in  running  it  on  a  16-bit  machine  (e.g./  it  assumes 
3<?-bit  integers)  it  was  decided  that  it  would  be  simoler  and 
more  beneficial  in  the  long  run  to  write  another  compiler. 

B.   THE  C  VERSION 

After  the  FORTRAN  version  was  abandoned,  pass  1  of  the 
PL/M  compiler  was  successfully  implemented  using  the  com- 
piler writing  facilities  supported  by  the  UNIX  operating 
system  [50).  Since  a  secondary  objective  of  the  project  was 
to  develop  a  system  for  experimenting  with  compiler  design, 
it    proved   worthwhile   to   utilize   these   more   efficient 
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facilities.  Because  of  the  time  constraints  placed  upon 
this  project  only  pass  1  of  the  compiler  was  implemented; 
however/  much  valuable  experience  was  gained  in  the  process* 
and  a  great  deal  of  this  thesis  has  been  influenced  by  the 
results  oot  a  i  ned. 
1.   YACC 

For  many  years  compiler  writing  was  more  of  an  art 
than  a  science/  but  many  important  developments  have  taken 
place  over  the  last  decade  to  reverse  this  situation.  Some 
of  the  most  impressive  of  these  developments  have  been  in 
the  area  of  formal  language  theory  and  automatic  parser  gen- 
eration. "The  ability  to  generate  parsers  from  a  syntactic 
description  of  a  language  is  an  important  consideration  in 
reducing  the  cost  of  developing  reliable  translators." 
[60/  p. 3a) 

The  parser  generator  in  the  UNIX  system  is  known  as 
YACC  (Yet  Another  Compiler-Compiler)  1301.  It  has  been  in 
use  for  about  two  years  at  Bell  Laboratories  where/  among 
other  things/  it  has  been  utilized  in  the  develooment  of  an 
easy-to-use  language  for  a  sophisticated  mathematics 
typesetting  system  (321.  Input  for  YACC  consists  of  a  syn- 
tactic and  semantic  description  of  the  grammar  of  the 
language  for  which  a  parser  is  desired.  Appropriate 
languages  belong  to  the  class  known  as  LALR  12/3/4]/  or 
look-ahead  LR/  since  they  read  text  from  the  left/  perform  a 
right-parse/  and  resolve  conflicts  by  looking  ahead  in  the 
text  stream.   This  is  a  very  oroad  and  useful  subset  of   the 
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context-free  languages*  and  one  which  includes  PL/M.  YACC 
checks  the  grammar  for  conflicts  and*  if  none  exist?  pro- 
duces a  set  of  perse  tables  for  the  language. 

The  semantics  associated  with  each  production  in  the 
grammar  are  transformed  by  YACC  into  a  C  program  which  con- 
tains the  parse  tables  as  data.  When  this  program  has  been 
comoiled  it  is  linked  with  a  parse  table  interpreter/  pro- 
vided by  the  YACC  library/  and  any  other  programs  which  have 
been  written  by  the  compiler  designer.  Actually  YACC  dto- 
vides  only  the  core  of  the  compilei the  parser  (which  per- 
forms the  syntactic  analysis  function  of  Figure  4  )  and  a 
means  of  communication  between  the  parse  stacks  and  the  pro- 
grams provided  to  perforin  the  other  functions  (lexical 
analysis?  error  analysis/  bookkeeping/  and  code  translation) 
of  the  code  generation  process. 

File  " m.gram"  in  Appendix  B  contains  the  YACC  input 
for  PL/ M .  The  syntactic  notation  is  somewhat  different  from 
the  normally  encountered  6  N  F  /  in  which  a  production  might  be 
written  as 

<N0NTERM1>  ::=  <N0NTEKM2>  TERMINAL  <N0NThRM3> 
rather  than  the  YACC  version 

nonterml:  nonterm2  'terminal'  nonterm3 
in  which  all  terminal  symbols  are  guoted  unless  they  have 
been  dec  1  a  red  to  be  terminals  (as  have  "identifier"/ 
"number"/  and  "string"  on  the  first  line  of  "m.gram").  The 
convention  of  using  a  vertical  bar  ("J")  to  indicate  the 
beginning   of   a   production   with  the  same  left  side  as  the 
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immediately  preceedinp  production  has  been  retained  from 
BNF.  The  semicolon  (";")  is  used  to  indicate  the  end  of  a 
set  of  productions  with  the  same  left  side.  It  should  be 
noted  that  a  quoted  semicolon  ("';'")  may  occur  within  a 
production  as  a  terminal  symbol. 

Semantics  are  provided  by  appending  an  equal  sign 
("  =  ")  followed  oy  a  C  language  statement  (compound  state- 
ments are  enclosed  in  braces/  "{"  and  " > " )  to  a  production 
before  either  the  vertical  bar  or  the  semicolon.  The  pro- 
cedures used  for  implementing  the  semantics  of  PL/M  are  dis- 
cussed in  Section  IV. B. 5. 

The  extreme  flexibility  afforded  by  the  use  of  an 
automatic  parser  generator  such  as  YACC  is  demonstrated  in 
Figure  5*  which  shows  the  changes  required  in  the  PL/M  gram- 
mar in  order  to  imolement  the  conditional  expression  con- 
struct (see  Section  III.B).  Productions  66  and  87  are 
currently  included  in  the  compiler  implemented  for  this  pro- 
ject/ and  productions  87a-87c  are  the  new  productions  which 
would  have  to  be  added. 


expression:  1 ogi ca 1  express i on     /*  86  */ 

!      variable  ':'  '  =  '  1 oo i c a  1  express i on  /*  87  */ 

i     ifexpression         /*  87a  */ 


i f express i on :  trueobject  expression 

i 

trueobject:  ifclause  expression  'else' 


/*  87b  */ 


/*  67c  */ 


Figure  b.   Potential  syntax  changes  for  adding 
the  conditional  expression  to  PL/M  (see  Chapter  III) 
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2.   Data  St  ructures 

One  of  the  first  and  most  important  steps  in  design- 
inq  a  complex  software  system  is  the  definition  of  an  ap- 
propriate set  of  data  structures.  The  principal  structures 
used  in  the  implementation  of  the  PL/M  compiler  are 
described  below  in  order  to  give  a  fuller  understanding  of 
the  nature  of  the  problem  and  insight  into  the  changes  which 
would  be  necessary  in  order  to  expand  the  use  of  the  com- 
piler to  a  more  general  environment.  The  declarations  of 
all  of  the  stacks  and  tables  used  can  be  found  in  the  file 
"m.decl"  in  Appendix  8.  The  macros  used  in  the  declarations 
are  defined  in  file  "m.def." 

The  two  most  important  data  structures  used  in  a 
modern  compiler  are  the  parse  stack  and  the  symbol  table. 
The  parse  stack  in  an  LALR  parser  is  used  to  store  input 
tokens  for  the  "shift"  and  "reduce"  operations.  In  general/ 
there  are  at  least  two  parallel  stacks  which  contain  various 
pieces  of  information  about  the  tokens.  YACC  provides  parse 
stacks  in  its  parse  table  interpreter  routiner  with  one 
stack  being  reserved  for  values  provided  by  the  scanner. 
The  operation  of  these  stacks  is  rather  complex  and  will  not 
be  considered  here/  since  Aho  and  Johnson  (23  have  provided 
an  excellent  survey  of  the  techniques  involved.  Since  there 
was  a  need  to  retain  more  than  a  single  piece  of  information 
about  each  token*  and  there  was  no  way  to  communicate  with 
the  parse  stacks  in  the  parse  table  interpreter  other  than 
to   provide   a   single   value*  it  was  necessary  to  implement 
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four  other  stacks  for  this  purpose.  The  ooeration  of  these 
stacks  is  discussed  in  Section  IV. B. 3. 

The  symbol  table  is  important  for  a  number  of  rea- 
sons* not  the  least  of  which  is  the  fact  that  it  is  used  in 
conjunction  with  the  intermediate  language  output  to 
transmit  information  to  the  later  passes  of  the  compiler. 
It  usually  accounts  for  the  bulk  of  the  main  memory  data 
storage  requirements  of  the  compiler  and  must  therefore  be 
implemented  in  as  efficient  a  manner  as  possible. 

The  symbol  table  is  a  vector  of  eight-bit  bytes 
which/  during  the  course  of  a  compilation*  consists  of  a 
series  of  entries  of  varying  types.  The  format  of  a  general 
symbol  table  entry  for  the  PL/M  compiler  is  shown  in  Figure 
6.  This  is  the  type  of  entry  which  is  generated  for  all 
variables  and  orocedures  declared  by  the  programmer. 
Reserved  words  and  macro  definitions  also  are  represented  by 
symbol  table  entries*  but  the  formats  of  these  entries  are 
slightly  different  from  that  shown  in  Figure  6.  The  differ- 
ences are  described  below*  following  the  description  of  the 
general  type. 

The  first  three  bytes  of  the  format  are  common  to 
all  three  types  of  entries  and  are  referred  to  as  fixed 
information  ("finfo"  in  the  programs).  The  first  byte  con- 
tains the  "last"  field*  which  specifies  the  number  of  bytes 
to  the  beginning  of  the  preceding  entry  and  is  used  for 
chaining  downward  through  the  table  (as*  e.g.*  when  printing 
or  dumping  the  symbol  table).   It  should  be  noted  that  since 
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+  ---- + 

1 engt  h 

1 enqt  h 


syno 


hcol  1 


name 


size 


prec 


type 


1  ast 


Figure  6 . 
Format  of  a  general  symbol  table  entry 
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the  "lest"  field  contains  only  eight  bits  each  symbol  table 
entry  is  limited  to  2b6  bytes,  although  it  will  generally  be 
rruch  shorter.  This  in  turn  ultimately  limits  the  lengths  of 
variable  and  procedure  names  and  macro  definitions,  since 
the  characters  for  describing  these  attributes  must  fit  into 
the  remainder  of  the  entry  after  the  fixed  information  and 
other  fields  have  utilized  some  of  the  256  bytes. 

The  second  byte  of  the  entry  contains  three  fields: 
"type,"  "precision  (prec),"  and  "based  (b)."  The  "type" 
field  consists  of  four  bits  and  is  used  to  distinguish  among 
the  various  types  of  entries  (variable,  reserved  word,  mac- 
ro, vector,  etc.).  The  alternatives  can  be  found  in  file 
"m.def."  The  precision  field  contains  three  bits  and  is  most 
commonly  used  to  represent  the  precision  (i.e.,  the  number 
of  bytes  required)  of  variables,  vectors/  or  the  result  of  a 
function  procedure  call  (zero  indicating  no  value  returned). 
The  "based"  field,  if  set  to  "1,"  indicates  a  based,  or 
indirect,  variable. 

Next  is  the  "size"  field,  in  the  third  byte.  This 
field  is  used  to  indicate  the  length  of  the  following  two 
fields,  "name"  and  "hcoll."  The  "name"  field  contains  the 
printname  of  the  symbol  and  has  a  length  egual  to  the  number 
of  characters  in  the  name.  The  "hcoll"  field  is  always  two 
bytes  long  and  contains  the  absolute  address  of  the  previous 
symbol  table  entry  whose  printname  has  the  same  hash  code  as 
this  symbol  (see  Section  IV. B. 3).  Thus  the  value  of  "size" 
is  equal  to  the  length  of  the  printname  plus  two. 
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During  the  course  of  this  project  it  was  found  that 
compiler-generated  labels  were  the  only  entries  which  had  no 
printnarne*  and  the  entries  for  these  symbols  conveyed  no 
information  other  than  the  symbol  number  (see  below).  Since 
they  only  took  up  precious  symbol  table  space  (especially 
for  long  programs*  which  already  require  a  great  deal  of 
space)*  these  entries  were  eliminated  from  the  symbol  table. 
Examination  of  the  symbol  table  in  Figure  8  makes  it  evident 
which  symbol  numbers  are  used  for  compiler-generated  labels* 
since  these  are  the  only  symbols  which  do  not  have  entries 
(e.g.*  S25*  S£8*  S29). 

Following  the  "hcoll"  field  in  the  general  symbol 
table  entry  is  the  "syno"  field.  This  field  contains  the 
symbol  number  of  the  entry.  Each  time  a  new  symbol  is  de- 
clared by  the  programmer  an  entry  of  this  type  is  made*  and 
the  next  sequential  symbol  number  is  assigned.  The  "syno" 
field  is  ten  bits  long*  and  thus  there  can  be  as  many  as 
10^*4  different  symbols  in  any  program  (including  compiler- 
generated  labels). 

The  final  field  in  the  general  entry  is  the  "length" 
field*  which  indicates  the  number  of  elements  in  a  vector  or 
the  number  of  arguments  reguired  by  a  procedure.  In  the 
latter  case  "length"  may  be  zero*  for  a  procedure  with  no 
arguments*  or  as  large  as  63*  since  a  procedure  definition 
uses  only  the  first  six  bits  of  this  field.  (This  restric- 
tion could  easily  be  changed*  however*  it  is  doubtful  wheth- 
er any  procedure  in  a  well-written  program  would  have  more 
than  63  arguments.) 
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A  saving  of  taole  space  is  accomplished  by  classify- 
ing vectors  into  two  categories*  short  and  long*  depending 
upon  whether  on  not  they  contain  fewer  than  t>4  elements.  In 
the  case  of  short  vectors  (distinguished  from  long  vectors 
by  the  "type"  field)  ana  variables/  the  byte  containing  the 
last  eight  bits  of  the  "length"  field  is  deleted,  as  dis- 
cussed in  Section  IV. b. 3. 

Figure  7  shows  the  changes  reguired  in  the  general 
format  of  Figure  6  for  reserved  words,  macro  definitions, 
and  based  variables.  As  indicated,  all  fields  from  "last" 
through  "hcoll"  remain  as  in  the  general  format.  Figure 
7(a)  indicates  that  the  entry  for  a  reserved  word  (e.g., 
"oo,"  "for,"  "while")  has  one  additional  byte,  the  "resno" 
field,  containing  the  reserved  word  number,  which  is  impor- 
tant in  the  parsing  process.  Since  this  field  contains  only 
eight  bits,  there  can  be  no  more  than  256  reserved  words  in 
the  language.  Following  the  "hcoll"  field  in  the  entry  for 
a  macro  definition  (Figure  7(b))  are  the  "msize"  and  "mdef" 
fields,  the  former  giving  the  number  of  characters  in  the 
definition  (restricted  to  a  maximum  of  255)  and  the  latter 
containing  the  definition.  For  a  based  variable  the  "based" 
field  contains  a  "1,"  and  there  is  a  "bsyno"  field  inserted 
between  the  "syno"  field  and  the  "length"  field,  as  shown  in 
Figure  7(c).  Ihe  "bsyno"  field  contains  the  symbol  number 
of  the  variable  which  serves  as  the  base.  The  six  unused 
Pits  in  this  type  of  entry  are  wasteful,  but  most  programs 
do  not  contain  many  based  variaoles. 
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+ ---  + 


resno 


hcol  1 


1  ast 


+ + 


syno/  1 engt  h 


mdef 


msize 


hcol  1 


1  ast 


(b) 


(a) 

+    + 


-_--•  + 

1 engt  h 

1 engt  h      ! 
---  + 

bsyno 


/ / 6  bits  unused//! 
/////////////////! 

+ 


syno 


(c) 


Figure  7.   Format  modifications  for 
(a)  reserved  words,  (b)  macro  definitions, 
(c)  based  variables 
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Now  that  the  various  fields  in  a  symbol  table  entry 
have  been  explained/  it  should  prove  useful  to  look  at  an 
example.  Figure  8  shows  the  symbol  table  which  was  con- 
structed by  the  PL/M  compiler  for  the  square  root  program  of 
Fiqure  1.  Each  line  of  the  printed  table  corresponds  to  one 
entry  in  the  symbol  table.  The  reserved  words,  which  are 
stored  immediately  below  symbol  SO,  are  not  shown  in  this 
table.  It  should  be  noted  that  there  are  no  "syno»" 
"oased/"  "precision,"  or  "length"  field  entries  for  macro 
definitions.  The  "name"  column  of  the  table  contains  the 
printnames  of  all  entries  and  the  "msize"  and  "mdef"  fields 
for  mac  ros . 

A  very  imoortant  point  to  note  here  is  that  symbols 
S0-S22  were  not  declared  in  the  square  root  program  but  are 
the  variaoles  and  procedures  which  relate  PL/M  to  the  Intel 
6060.  These  symbols  were  placed  into  the  symbol  table  dur- 
ing the  initialization  of  the  compiler  and  can  be  considerea 
to  have  been  declared  in  an  outer  dock  encompassing  the 
square  root  program.  The  manner  in  which  this  was  done  can 
be  seen  by  examining  file  " m.main.c"  in  Appendix  B.  Since 
it  is  very  easy  to  change  the  names  and  attributes  of  these 
symbols  in  "m.main.c"  it  is  also  very  easy  to  tailor  the 
language  to  the  architecture  of  the  machine  for  which  the 
object  code  is  to  be  generatea  (see  Chapter  III).  The  mean- 
ings of  these  symbols  need  not  be  of  concern  during  the 
first  pass  of  the  compilation  but  will  be  important  during 
later  passes. 
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Syno 

B  Pr 

Len 

Type 

Si  ze 

Name 

S67 

1 

10 

12 

13 

mon  i  toruses 

S60 

1 

98 

11 

9 

headi  nq 

1 

6 

crl f  5  cr,lf 

S58 

2 

1 

2 

3 

i 

S52 

1 

16 

12 

6 

t  emp 

S51 

1 

2 

3 

J 

S50 

1 

2 

3 

i 

S48 

1 

2 

14 

ze  rosuppress 

S47 

1 

2 

7 

chars 

sa6 

1 

2 

6 

base 

S4b 

2 

2 

8 

numbe  r 

s<m 

0 

4 

6 

13 

or  i  nt numbe r 

sai 

*  1 

2 

6 

char  Based  S37 

sao 

1 

2 

3 

i 

S38 

1 

2 

8 

1 engt  h 

S3/ 

2 

2 

6 

name 

836 

0 

2 

6 

13 

printstring 

S33 

1 

2 

3 

i 

1 

9 

bi tcel 1  2  91 

S31 

1 

2 

6 

char 

S30 

0 

6 

11 

printchar 

827 

2 

2 

3 

z 

S26 

2 

2 

3 

y 

S24 

2 

2 

3 

X 

S23 

1 

6 

12 
7 
6 

n 
a 

5 

squareroot 
false  1  0 
t  rue  1  1 
If  3  0 a h 
cr  3  1 5q 
t  to  1  2 

Figure  8.   PL/M  symbol  table 
for  the  program  of  Figure  1 
(continued  on  next  page) 
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Syno 

B  Pr 

Len 

Type 

S  i  ze 

Name 

S22 

2 

1 

7 

0 

S21 

2 

2 

8 

5 

dec 

S20 

2 

2 

8 

8 

doub 1 e 

S19 

8 

b 

move 

S18 

8 

6 

1  ast 

SI  / 

8 

8 

1 engt  h 

Sib 

8 

8 

output 

Sib 

8 

7 

i  npu  t 

S14 

8 

5 

1  ow 

S13 

8 

6 

h  i  qh 

S12 

0 

8 

6 

t  i  me 

SI  1 

2 

8 

5 

sc  r 

S10 

2 

8 

5 

sc  1 

S9 

2 

8 

5 

sh  r 

S8 

2 

8 

5 

shl 

S7 

2 

8 

5 

ror 

S6 

2 

8 

5 

ro  1 

S5 

2 

7 

10 

st  ac  kpt  r 

sa 

7 

6 

memory 

S3 

7 

8 

pari  t  y 

S2 

7 

6 

s  i  gn 

SI 

7 

6 

zero 

SO 

7 

7 

carry 

Figure  8.   PL/M  symbol  table 
for  the  Drogram  of  Figure  1  (continued) 
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3 .   The  Parser 

The  main  function  of  pass  1  of  the  PL/M  compiler  is 
to  convert  the  source  language  program  into  a  form  which  can 
be  used  by  remaining  stages  of  the  compiler  to  generate 
machine  code.  The  source  language  program  is  reoresented  in 
the  computer  as  a  linear  string  of  ASCII  characters,  orga- 
nized as  &  series  of  "  i dent i f i ers , "  "numbers/"  and  "strings" 
(all  called  "tokens").  This  series  of  tokens  is  the  "text 
stream"  for  the  compiler.  In  order  to  perform  the  transla- 
tion, pass  1  must  parse  the  program;  i.e./  it  must  examine 
the  text  stream  and  determine  which  of  the  rules  of  the  PL/M 
grammar  can  be  applied  in  order  to  reduce  the  tokens  to  a 
"statement  list"  and  finally  to  a  "program"  (see  file 
"m.gram"  in  Appendix  B).  This  section  contains  an  overview 
of  the  parsing  and  symbol  table  functions  of  pass  1. 

hhen  the  parser  provided  by  YACC  reguires  a  token 
from  the  text  stream/  it  calls  the  user-provided  routine 
"yylex."  (In  this  section  "user"  refers  to  the  compiler 
designer  rather  than  the  firmware  designer.)  This  routine 
and  the  routines  which  it  calls  are  listed  in  file 
"m.scan.c"  (Appendix  3).  "yylex"  calls  "gettoken,"  which 
constructs  tokens  from  the  input  characters/  determines 
which  of  the  three  types  of  tokens  (or  a  special 
charactei e.g./  comma/  semicolon)  it  has  found/  and  com- 
putes a  hash  code  for  each  identifier.  The  latter  function 
is  accomplished  by  forming- the  sum/  modulo  128/  of  the  ASCII 
values  of  the  characters  in  the  printname  of  the  identifier. 
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This   hash   code  is  used  by  "yylex"  later  for  looking  up  the 
identifier  in  trie  symbol  table. 

.The  vector  "varc"  (Figure  9 )  is  used  by  "gettoken" 
to  accumulate  characters  from  the  input  string.  Several 
tokens  may  oe  accumulated  in  "varc"  before  being  used  by  the 
parser^  and  the  variable  "tokindex"  is  used  to  indicate  the 
element  of  "varc"  which  is  the  beginning  of  the  current 
"accumulator."  The  first  byte  of  each  accumulator  contains 
the  length  of  the  token?  thus  limiting  the  length  of  each 
token  to  no  more  than  254  characters.  Since  the  length  of 
"varc"  is  normally  less  than  255?  and  it  may  contain  more 
than  one  token?  the  upper  bound  on  the  length  of  a  token  is 
usually  much  less  than  254. 

Once  "gettoken"  has  completed  its  functions?  control 
returns  to  "yylex?"  which  may  take  one  of  several  sets  of 
actions?  oepending  on  the  type  of  token  scanned.  If  an  end 
of  file  character  or  other  special  character  was  scanned, 
"yylex"  returns  the  character  to  the  parser.  If  a  number 
was  scanned?  "yylex"  reports  this  to  the  Darser  and  returns 
the  value  of  the  number.  If  either  a  string  or  an  identi- 
fier was  scanned?  "yylex"  "pushes"  information  onto  the 
user-controlled  parsing  stacks  (Figure  9).  (The  stack 
manipulation  routines  are  listed  in  file  "m.aux.c.")  In  the 
case  of  a  string?  the  stack  pointer  ("sp")  is  incremented? 
"varfspJ"  is  assigned  the  current  value  of  "tokindex?"  and 
"tokindex"  is  advanced  to  the  value  of  the  next  free  loca- 
tion  in   "varc."   The   fact   that   a   string  was  scanned  is 
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symloc    hash     fixv      var 
+ + 


Figure  9.   Scanning  and  parsing  stacks 
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reported  to  the  parser  along  with  the  current  value  of  "  s  p  *  " 
as  discussed  in  Section  IV. B. 5. 

The  actions  for  an  identifier  are  somewhat  more  com- 
plicated*  since  the  identifier  may  be  a  reserved  word*  macro 
call*  or  proarammer-def i ned  word.  In  order  to  determine 
which  case  is  applicable*  the  identifier  is  looked  up  in  the 
symbol  table  by  finding  its  address  in  the  element  of  the 
vector  "hentry"  given  by  the  hash  code  computed  by  "getto- 
ken."  If  the  address  is  other  than  the  zeroth  element  of  the 
symbol  table*  the  printname  stored  in  "varc"  is  compared 
with  the  printname  stored  in  the  table  entry.  If  the  names 
do  not  match*  the  value  of  "hcoll"  is  used  as  the  next  ad- 
dress in  the  search.  This  process  continues  until  either  a 
match  is  found  or  the  current  value  of  "hcoll"  is  the  ad- 
dress of  the  zeroth  element  of  the  symbol  table.  If  an 
entry  for  the  identifier  is  located  in  the  symbol  table*  the 
"type"  field  is  examined  to  determine  whether  it  is  a 
reserved  word  or  a  macro.  In  the  former  case*  the  reserved 
word  number  is  returned  to  the  parser.  In  the  latter  case* 
the  scanner  is  set  up  to  begin  reading  input  characters  from 
the  "mdef"  field  of  the  symbol  table  entry*  and  "gettoken" 
is  called  again. 

If  the  identifier  is  neither  a  reserved  word  nor  a 
macro*  information  about  it  is  "pushed"  onto  the  parsing 
stacks  in  the  manner  discussed  above  for  strings.  In  addi- 
tion to  the  information  stored  in  "var*"  the  address  of  the 
symbol  table  entry  is  stored  in  "symloc*"  and   the   hashcode 
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is  stored  in  "hash"  (see  Figure  9).  The  "fixv"  stack  is 
used  to  hold  other  types  of  information  during  the  parsing 
process.  The  fact  that  an  identifier  was  scanned  is  report- 
ed to  the  parser  along  with  the  current  value  of  "sp.M 

If  the  symbol  table  search  is  unsuccessful >  an  entry 
must  be  made  using  the  routines  in  file  " m.sym.c."  The  entry 
is  made  immediately  following  the  most  recent  previous  en- 
try. The  "hcoll"  field  of  the  new  entry  is  set  to  the  value 
of  "hentry  [hashcodel "»  and  the  value  of  "hentry (hashcode) " 
is  changed  to  the  address  of  the  new  entry.  It  is  assumed 
at  this  time  that  both  parts  of  the  "length"  field  will  be 
required.  If  it  is  discovered  (during  the  parsing  of  later 
text)  that  only  the  first  six  bits  of  the  "length"  field  are 
needed,  the  "compress"  routine  (file  "m.sym.c")  must  be 
called  to  remove  the  extra  byte  in  order  to  save  space  in 
the  table. 

The  parser  itself  uses  tables  generated  from  the 
grammar  (file  "m.gram")  by  YACC  in  order  to  perform  the 
translation  from  the  source  language  to  the  intermediate 
language  (see  Chapter  V).  It  does  this  by  shifting  tokens 
onto  a  set  of  parsing  stacks  hiaden  from  the  compiler 
designer.  ft hen  the  tokens  match  one  of  the  rules  of  the 
grammar,  a  reduction  is  made  by  replacing  the  tokens  on  the 
stack  with  the  symbol  on  the  left  side  of  the  rule  (or  pro- 
duction). Ihe  methods  used  for  detecting  and  recovering 
from  errors  in  the  input  and  the  techniques  for  generating 
the  intermediate  language  code  are  discussed  in  the  next  two 
sec  t  i  ons . 
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4  .  t r ror  Recove ry 

In  the  discussion  of  the  scanner  in  Section  IV. b. 3 
it  was  assumed  that  the  inout  stream  constituted  a  valia 
PL/Ni  program.  Unfortunately*  this  is  not  always  the  case* 
especially  in  the  early  stages  of  program  development.  In 
addition  to  the  other  tasks  which  a  scanner  must  perform, 
therefore,  it  must  be  able  to  detect  errors  and  report  them 
to  the  programmer.  One  measure  of  a  good  compiler  is  its 
ability  to  accurately  report  all  program  errors. 

Debugging  of  a  large  program  would  be  greatly  inhib- 
ited if  the  compilation  terminated  after  the  detection  of  a 
single  error.  Thus  it  is  desirable  for  the  scanner  to  have 
error  recovery  mechanisms  which  enaole  it  to  continue  pro- 
cessing after  detecting  and  reporting  an  error.  The  error 
handling  and  recovery  techniques  included  in  the  Y  A  C  C  ver- 
sion of  the  PL/M  compiler  are  discussed  in  this  section. 

There  are  three  basic  kinds  of  errors  which  may 
appear  in  a  program-- 1 og i c *  syntactic,  and  semantic.  Logic 
errors  are  errors  in  the  Drogrammer's  thought  processes 
which  cause  him  to  write  statements  which  do  something  other 
than  what  he  intended.  For  example,  he  might  write  an  ex- 
pression incorrectly  or  use  the  wrong  indexing  variable  when 
working  with  a  vector.  It  is  impossible  for  a  comoiler  to 
detect  errors  of  this  type  unless  they  also  result  in  syn- 
tactic or  semantic  errors. 

Syntactic  errors  result  from  the  violation  of  the 
grammatical    rules   of   the   language.    The   rules   for   a 
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programming  language  like  PL/M  are  given  in  terms  of  a 
series  of  productions/  as  in  file  "m.gram"  in  Appendix  B . 
One  of  the  main  advantages  of  using  a  parser  derived  from 
such  a  grammar  is  that  it  immediately  detects  and  reports 
syntactic  errors. 

Semantic  errors  are  errors  which  do  not  violate  the 
rules  of  the  language  but  which  do  not  have  any  meaning  (or 
have  an  incorrect  meaning)  in  the  language.  It  is  easy  to 
write  nonsense  sentences  in  English  which  are  grammatically 
correct.  An  example  of  a  semantic  error  in  a  programming 
language  is  the  use  of  a  variable  before  it  is  declared. 
Some  languages  allow  thisr  but  in  the  current  YACC  version 
of  the  PL/M  compiler  this  is  not  allowed/  since  proper  sym- 
bol table  entries  are    made  only  for  declaration  statements. 

At  this  point  it  should  be  helpful  to  look  at  an 
example.  Figure  10  lists  the  sample  PL/M  program  of  Figure 
1  with  several  errors  intentionally  introduced.  When  this 
program  was  run  through  the  compiler  the  output  was  as  shown 
in  Figure  11.  It  should  oe  noted  that  there  are  two  basic 
types  of  errors  identified  in  the  output.  Syntactic  errors 
are  identified  by  the  term  "syntax  error/"  while  semantic 
errors  are  identified  by  the  term  "compile  error." 

In  order  to  allow  the  parser  to  continue  scanning 
the  input  after  a  syntactic  error  is  encountered/  YACC  al- 
lows an  "error"  production  to  be  included  in  the  grammar. 
Production  18  in  file  "m.gram"  in  Appendix  B  is  the  error 
production  used  for  the  PL/M  compiler.   In   this   prouuction 
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1. 

2. 

3. 

a. 

5. 

6. 

7. 

8. 

9. 
10. 
11. 
12. 
15. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21  . 
22. 
25. 
24. 
2b. 
2b. 
27. 
28. 
29. 
30. 
31. 
32. 
33. 
34. 
35. 
36. 
37. 


2048:    /*  is  the  origin  of  this  program  */ 
declare  tto  literally  '2',  cr  literally  '15g', 
If  1 i  teral ly  'Oah'  , 
true  literally  '1',  false  literally  '0V 

squareroot:  procedure(x  byte? 
declare  (x» y,z)  address; 
y  =x;  z  =  s  h  r ( x  +  1  ,  1)  ; 

do  while  y  <>  z ; 

y  =  z;  z  =  s  h  r  ( x  /  y  +  y  +  1,  1); 

end; 
return  y 
end  squareroot; 

printSchar:  proceoure(char); 

declare  bitScell  literally  '91', 

(  c  h  a  r ,  i  )  byte; 
output  (tto)  =  o; 
call  time  ( bitScell); 

do  i  =  o  to  7 ; 

output(tto)  =  char;  /*  data  pulses  */ 

char  =  '  ror (char, 1 ) ; 

call  t i me (bi  t ice  1 1 ; 

end; 
outoutCtto)  =  l; 

call  time  (.bit  See  11  +  bitScell); 
/*  automatic  return  is  generated  */ 
end  printSchar; 

printist  ring:  procedure(namer 1 ength) ; 
declare  name  address* 

(length, i ,char  based  name)  byte; 

do  i  =  0  to  1 engt  n  -  1 

call  pr i nt f c ha r ( c har ( i ) ; 

end; 
end  printistring; 


Figure  10.   PL/M  square  root 
program  with  errors 
(continued  on  next  page) 
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3  8  .  p  r  i  n  t  $ number:  procedure(number;base»chars» zeroJsuporess) ; 
39.     declare  number  address? 

(baserCharSf  ?ero$suppressr  i »  j )  byte; 
declare  temp  (16)  byte; 

if  chars  >  last (temp)  then  chars  =  last (temp) ; 
do  i  =  1  to  chars; 
j  =  number  mod  oase  +  '0'; 
if  j  >  '9'  then  j  =  j  t  7; 
if  ze roisuopress  amd  1  <>  1  and  number  =  0  then 


40. 
41. 

4a. 

43. 

44. 
45. 
46. 
47. 
48. 
49. 
50. 
51. 
52. 


j  =  '  '; 

t emp( 1 enqt h ( t emp) - i )  =  j; 

number  =  number  /  base? 

end; 
call  pr i nt is t ri nq (. tempt  1 engt h ( t emp) -chars /  chars); 
end  pr i nt ^number ; 


53. 

54. declare  i  address* 

55.     crl f  1  i  teral  1 y  'cr/lf ', 

heading  data  (crlf,lf,lfr 

'  table  of  square  roots'/ 

crl  f  ,  1  f  , 

1  value   root  value   root  value   root  value   root ' r 

'  value   root ' t 

crlfrlf); 


56. 

57. 

58. 

59. 

60. 

61  . 

62. 

63. 

64. 

65. 

66. 

67. 

66. 

69. 

70. 

71. 

72. 

73. 

74. 

75. 

76. 

77. declare  mon i t orfcuses  (10)  byte; 

78.eof 


/*  silence  tty  and  print  computed  values  */ 

output ( t  to)  =  l ; 

do  i  =1  to  1000; 
i  f  i  mod  5  =  1  then 

do;  i  f  i  mod  250  =  1  then 

cal 1  printistring(.heading, length(headinq)); 
e  1  se 

call  pr i nt s t r i ng ( . (c r i 1 f t 2) ; 
end; 
call  pr i nt inumber ( i t  1 0 r 6 t  t rue  /*  true  suppresses 

leading  zeroes  */); 
call  pr i nt ^number ( squareSroot ( i ) f 1 0 ,  6,  true); 
end; 


Figure  10  (continued).   PL/M 
square  root  program  with  errors 
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syntax  error?  line  6/  on  input:  byte 

syntax  error,  line  13/  on  input:  end 

compile  error,  line  20  :  variable  undeclared 

compile  error,  line  2u  :  identifier  cannot  be  a  variable 

syntax  error,  line  2  3,  on  input:  ; 

syntax  error,  line  34,  on  input:  call 

compile  error,  line  3b  :  identifier  required 

syntax  error,  line  36,  on  input:  end 

compile  error,  line  46  :  variable  undeclared 

syntax  error,  line  46,  on  input:  identifier 

syntax  error,  line  70,  on  input:  ; 


Figure  11.   Compiler  output 
for  program  of  Figure  10 
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"error"   is  a  reserved  terminal  symbol  name,  and  it  causes  a 

state  to  be  included  in  the  parser  which  will  be  entered  any 

time  an  invalid  symbol  is  scanned. 

/Jhen  an  error  is  seen,  the  currently  active  states  are 
popped/  one  by  oner  until  a  state  is  reached  which  has  a 
shift  on  error.  This  shift  is  then  done,  ana  the  reduc- 
tion performed.  The  user  may  specify  an  action,  to  do 
things  such  as  position  the  input  string  ana  repair  the 
symbol  table.  After  this  reduction  is  done,  a  flag  is 
set,  and  the  parser  remains  in  error  state  until  three 
input  symbols  have  been  successfully  shifted.  If  an  error 
takes  place  when  the  parser  is  still  in  error  state,  the 
input  symbol  is  discarded  and  no  new  message  is  produced. 
130,  p. 13] 

The  reason  for  discarding  input  symbols  if  an  error  occurs 
while  the  parser  is  still  in  error  state  is  to.  prevent  a 
simple  syntactic  error  from  causing  an  inordinate  number  of 
misleading  messages  to  be  generated.  Of  course,  if  there 
are  any  actual  errors  in  the  text  while  the  parser  is  in  the 
error  state  they  will  be  ignored.  For  example,  in  Figure  11 
it  can  be  seen  that  the  parser  discovered  an  error  at  the 
beginning  of  line  3  4  when  it  encountered  the  symbol  "call" 
without  scanning  a  semicolon.  Figure  10  shows  that  there  is 
a  missing  parenthesis  at  the  end  of  line  34,  ana  this  was 
not  detected  by  the  parser,  since  it  was  still  in  error 
state.  This  is  not  a  serious  problem,  since  the  parser 
would  detect  this  error  on  the  second  compilation  attempt, 
after  the  errors  detected  on  the  first  try  were  corrected. 

The  error  production  used  in  this  compiler  causes 
the  parser  to  scan  until  finding  a  semicolon  before  attempt- 
ing to  continue  parsing.  This  was  found  to  oe  an  effective, 
although  simple,   error   handling   technique.    The   actions 
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which  must  be  taken  to  allow  parsing  to  continue  without 
overflowing  the  various  stacks  and  tables  can  be  seen  by 
examining  the  listings  in  Appendix  b .  Since  PL/M  is  a 
statement  oriented  language  rather  than  a  card  oriented 
language  (such  as  FORTRAN)  and  statements  are  usually  rela- 
tively short/  most  errors  will  be  detected  by  this  scheme. 
In  future  work-  it  might  prove  beneficial  to  explore  addi- 
tional schemes^  such  as  scanning  to  a  comma  or  a  close 
parent  hes  i  s . 

The  actions  reguired  for  detecting  and  reporting 
semantic  errors  are  not  as  easy  to  specify  as  are  those  for 
syntactic  errors/  since  they  are  scattered  througnout  the 
grammar.  For  each  production  in  the  grammar  the  compiler 
designer  must  consider  the  meaning  of  any  actions  which  are 
to  be  taken  and  what  circumstances  will  will  cause  the  ac- 
tions to  be  incorrect.  For  example/  the  discussion  in  Sec- 
tion I V • B • 2  points  out  that  a  procedure  may  have  no  more 
than  63  arguments.  Thus  the  actions  associated  with  the 
parsing  of  a  "parameter  list"  (production  42  in  file 
"m.gram")  must  include  a  check  for  the  number  of  arguments. 
Since  it  is  very  difficult  to  check  all  possible  error  con- 
ditions of  this  type/  semantic  errors  are  much  more  diffi- 
cult to  detect  effectively  than  are  syntactic  errors.  In 
Figure  11/  for  example/  it  can  be  seen  that  the  error  on 
line  20  ("o"  in  place  of  "0")  causes  a  redundant  error  mes- 
sage to  be  generated.  Improvement  of  the  semantic  error 
detection  and  reporting  mechanisms  in  this  compiler  would  be 
a  worthwhile  undertaking  for  future  work. 

67 


5.   S  e  m  a  n  t  i c  s  and  Code  Em i  t 1  i  nq 

The  method  for  converting  semantic  act  ionsr  provided 
by  t he  .compi 1 e r  designer,  into  a  C  language  program  is  dis- 
cussed briefly  in  Section  IV.B.l.  In  order  for  the  action 
statements  to  communicate  with  the  parser*  a  special  nota- 
tion using  "$"  variables  is  employed  by  YACC.  An  example  of 
this  notation  may  be  seen  by  examining  the  action  statements 
associated  with  oroduct ion  7  o  in  file  "rn.gram"  (Appendix  B). 
Each  symbol  on  the  right-hand  side  of  a  production  (to  the 
right  of  the  colon)  corresponds  to  a  pseudo-variable/  the 
name  of  which  is  composed  of  a  dollar  sign  followed  by  a 
digit  indicating  the  relative  position  of  the  symbol  in  the 
production.  Thus  "identifier"  has  the  corresponding 
pseudo-variable  "2>1"  in  production  76.  There  is  always  one 
and  only  one  symbol  on  the  left-hand  side  of  a  production, 
and  it  has  the  corresponding  pseuao-variable  "&$"  associated 
with  it.  The  "$"  notation  is  a  convenience  for  the  compiler 
designer,  and  all  pseudo-variables  are  converted  by  YACC 
into  actual  C  language  variables  before  compilation. 

In  Section  IV. 6. 3  it  is  stated  that  the  current 
value  of  "sp"  is  passed  to  the  parser  when  an  identifier 
(other  than  a  reserved  word  or  macro)  is  scanned.  Any  such 
information  passed  by  "yylex"  to  the  parser  may  be  accessed 
in  the  action  statements  by  referring  to  the  appropriate  "  $  M 
variable.  Thus,  in  production  76,  the  value  of  "$1"  is  the 
value  of  "sp"  received  from  "yylex."  This  value  is  first 
passed   as   an   argument   to   the  procedure  "symcheck"  (file 


68 


"m.act.c")?  which  checks  to  see  if  the  variable  has  been 
previously  declared.  Since  production  76  is  only  applied 
dunnq  the  parsing  of  declaration  statements*  this  consti- 
tutes a  check  for  a  semantic  error--the  redeclaration  of  a 
variable  within  the  same  block  of  the  PL/M  program.  The 
next  three  statements  are  execute d  if  this  is  the  first  time 
the  variable  is  oeing  declared  in  the  current  block.  First/ 
the  value  of  " fixvlspl"  is  set  to  zero  to  indicate  that  this 
is  not  a  based  variable.  The  next  statement  ("$$>  =  sytop") 
is  useo  to  communicate  to  the  next  production  (possibly  72) 
the  location  at  which  the  symbol  table  entry  for. this  vari- 
able oegins.  The  third  statement  calls  the  "enter"  routine 
(file  "m.sym.c")  to  actually  make  the  entry  in  the  symbol 
table.  Whether  or  not  an  entry  is  made  in  the  symbol  table? 
the  final  statement  is  executed  to  clear  the  information 
associated  with  this  variable  from  the  user-controlled 
stacks.  The  parser  then  makes  the  reduction  indicated  by 
production  76  and  stores  the  information  associated  with 
"5)4"  in  one  of  its  parse  stacks. 

An  example  of  a  production  which  causes  intermediate 
language  code  to  be  emitted  can  be  seen  in  the  action  state- 
ment associated  with  production  9b.  In  this  statement?  "$2N 
refers  to  a  value  received  from  a  previously  applied  produc- 
tion (one  of  the  set  97-102).  The  "emit"  routine  (file 
"m.act.c")  is  called  with  two  arguments?  the  first  giving 
the  prefix  ("OPK")  and  the  second  giving  the  operator  deter- 
mined by  the  value  of  "$2"  (see   Appendix   A).    The   "emit" 
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routine  writes  the  information  onto  a  disk  file  which  may  be 
used  oy  later  stages  of  the  compiler  to  generate  machine 
code. 

It  should  be  noted  that  the  actions  associated  with 
production  79  also  write  information  to  a  disk  file  usinq 
the  "putw"  routine  provide  by  the  C  language  library.  This 
second  output  file  is  used  to  store  all  "initial"  values 
declared  in  the  PL/M  program  oeing  compiled. 

One  final  point  can  be  made  oy  considering  the  ac- 
tion statements  associated  with  production  33.  Productions 
such  as  this  are  connected  with  the  flow  of  control  state- 
ments in  PL/Nif  ana  they  cause  compiler-generated  labels  to 
be  produced  in  order  to  effect  proper  branching.  Since 
these  labels  are  often  not  generated  in  the  same  sequence  in 
which  they  must  appear  in  the  intermediate  language  code* 
there  has  to  be  a  mechanism  for  storing  them  until  they  are 
emitted.  As  in  the  case  of  production  33r  labels  are  oen- 
erated  by  incrementing  the  variable  "nsym."  Code  for  a  con- 
ditional branch  is  emitted  at  this  point*  but  the  label  must 
be  saved  until  the  remainder  of  the  code  around  which  the 
branch  occurs  has  oeen  generated.  This  is  done  by  calling 
the  "spush"  procedure  (file  "m.act.c")/  which  pushes  the 
label  onto  the  "cstack."  In  order  to  save  space  in  the  com- 
piler, the  "cstack"  is  actually  not  a  separate  stack  but 
rather  an  area  at  the  top  of  the  space  allocated  to  the  sym- 
bol table. 

There  are  obviously  many  more  details  concerning  the 
performance  of  this  compiler  than   can   be   presented   here. 
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The  YACC  reference  document  [301  should  be  consulted  for  a 
more  complete  discussion  of  the  capabilities  of  YACC/  and 
the  program  listings  in  Appendix  8  should  be  studied  in  or- 
der to  determine  how  these  capabilities  were  aoplied  to  the 
PL/M  compiler. 
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V.       THE    INTERMEDIATE    LAI^GUAGE 


A.       FUNCTION 

A  very  important  concept  in  the  design  of  a  compiler  for 
user-def i nao 1 e  architectures  is  that  of  the  intermediate 
language.  The  compiler  model  shown  in  Figure  4  assumes  that 
the  source  program  will  be  translated  to  an  intermediate 
form,  although  it  is  certainly  possible  to  design  a  compiler 
which  translates  directly  to  machine  code.  In  fact  many 
compilers  have  been  designed  in  the  latter  way/  but  they 
lack  transportability  and  are  not  aole  to  easily  take  advan- 
tage of  the  more  advanced  optimization  techniques. 

The  idea  of  using  an  intermediate  language  dates  back  at 
least  as  far  as  19b8  when  there  was  a  discussion  of  the  need 
for  a  universal  computer-oriented  language  (UNCOL)  in  some 
of  the  early  issues  of  the  Commun  i  c  at  i  ons  o  f  the  ACM 
112,54,55].  The  intent  was  to  have  the  UNCOL  serve  as  an 
intermediary  between  high-level  languages  and  machine 
languages.  This  would  allow  a  compiler  writer  to  concen- 
trate on  translating  from  his  high-level  language  to  the 
UN COL  without  worrying  about  the  machine  code  considera- 
tions. It  would  also  allow  a  system  programmer  for  a  given 
machine  to  write  a  generator  program  which  produced  the  best 
machine  code,  independent  of  the  high-level  language.  When- 
ever  a   new  machine  was  obtained  at  a  computer  installation 
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only  the  program  which  translated  from  UNCUL  to  machine  code 
would  have  to  be  changed  in  order  to  continue  using  the 
high-level  languaqes  which  had  been  used  previously.  Old 
programs  could  be  recompiled  without  changing  the  source 
language.  Similarly/  whenever  a  new  language  was  designed 
it  woula  be  necessary  only  to  write  a  translator  which  could 
convert  programs  written  in  the  new  language  to  UNCOL  pro- 
grams. The  new  language  would  then  be  available  to  users  of 
any  computer  for  which  an  UNCOL-machine  code  translator  had 
been  wr  i  1 1 en . 

No  such  universal  language  has  ever  been  developed/  but 
the  concept  of  an  intermediate  language  has  been  used  by 
many  software  designers  in  writing  compilers  which  could 
generate  code  for  computers  with  different  architectures  and 
instruction  sets  (e.g./  the  PL/M  compilers  available  from 
Intel  for  the  8008  and  8080  microprocessors).  The  fact  that 
an  intermediate  language  is  useful  in  compiling  for  user- 
definable  architectures  is  verified  by  the  use  of  such  a 
mechanism  by  those  who  are  trying  to  design  high-level 
languages  for  m i c roprogrammab 1 e  machines  [18/47]. 

The  main  function  of  the  intermediate  language  in  the 
PL/K  compiler  is  to  serve  as  an  information  transmission 
medium  between  pass  1  and  succeeding  passes.  In  this  role 
it  is  complemented  by  the  symbol  table  (Section  IV.  B .  ?.)  and 
the  initial  value  file  (Section  IV. 8. 3).  The  symbol  table 
transmits  the  names  and  other  attributes  of  the  symbols  used 
in  the  high-level  program/  the  initial  value  file   transmits 
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the  initial  values  of  variables  which  are  to  be  initialized/ 
and  the  intermediate  language  carries  information  about  the 
actual  program  steps  required  by  the  algorithm.  Other  in- 
formation may  be  contained  in  the  intermediate  language 
coder  e.g./  the  line  number  markers  which  are  helpful  in 
providing  good  diagnostics  and  information  to  aid  the  pro- 
grammer but  are  not  really  needed  for  the  code  generation 
process . 

Besides  its  use  in  transmitting  information  the  inter- 
mediate language  may  have  an  important  role  in  the  process 
of  debugging  and  simulating  the  actions  of  programs 
translated  by  the  compiler.  As  explained  in  Section  V.B 
below*  the  intermediate  language  code  for  the  PL/M  compiler 
can  be  considered  to  be  the  "machine  code"  of  a  mythical 
stack  machine.  It  would  not  be  difficult  to  write  an  inter- 
preter which  could  read  this  code  and  simulate  the  actions 
of  the  mythical  machine  in  order  to  help  the  programmer 
debug  his  high-level  language  Drogram.  Broca  and  Merwin  (9j 
have  devoted  a  oaper  to  this  topic/  and  Reigel  and  Lawson 
148)  have  indicated  that  this  could  provide  an  important 
facility. 

8.   THL  POLISH  RtPKESENT AT  1  ON 

The  intermediate  language  code  for  the  PL/M  comoiler  is 
based  upon  postfix  Polish  notation  [3/4/22/42].  The  reason 
for  using  this  type  of  intermediate  language  is  demonstrated 
in  Figure  12/  which  shows  how  an  expression  written  in  infix 
(algeoraic)   notation   would   be   translated.   The  bottom-up 
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Figure  1 d .  txample  of  Polish  intermediate  language  code 
(a)  infix  expression,  ( b )  equivalent  postfix  expression, 
(c)  tree  representation,  (d)  intermediate  language  code 
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parsing  process  for  the  infix  expression  in  Figure  12(a)  can 
be  visualized  in  the  tree  structure  of  Figure  12(c)*  which 
ti  s  a  t  wo-di  mens  i  ona  1  representation  of  the  postfix  expres- 
sion in  Figure  12(b).  Thus  the  intermediate  language  code 
of  Figure  12(d)  results  naturally  from  the  parsing  of  the 
infix  expression.  Noteworthy  is  the  one-to-one  correspon- 
dence between  symbols  in  Figure  12(b)  and  lines  of  code  in 
Figure  12(d). 

This  type  of  intermediate  language  reoresent at i on  is 
usually  discussea  in  texts  on  compiler  theory/  but  the 
presentations  usually  do  not  provide  many  details  about  the 
methods  used  for  an  entire  practical  program.  Often  the 
discussion  is  limited  to  the  type  of  information  presented 
in  Figure  12/  but  operators  other  than  the  simple  arithmetic 
type  are  required  in  a  language  designed  to  represent  real 
programs.  For  example/  operators  are  required  for  branch- 
ing/ subscript  calculation/  and  stack  manipulation.  The 
complete  list  of  intermediate  language  prefixes  and  opera- 
tors used  in  the  PL/M  compiler/  along  with  their  meanings/ 
is  given  in  Appendix  A. 

In  order  to  provide  an  example  of  how  the  intermediate 
language  is  used  to  represent  a  program/  Figure  13  presents 
part  of  the  code  generated  by  oass  1  of  the  PL/M  compiler 
for  the  square  root  program  of  Figure  1.  The  symbol  table 
for  this  program  can  be  found  in  Figure  8.  Noticeable  in 
this  figure  are  the  expansion  factor  and  the  loss  of  under- 
stand ability  apparent  in  going,  from  the  high-level   language 
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Figure  13.   Intermediate  language  code  for  the 

program  of  Figure  1  and  symbol  table  of  Figure  8 

(continued  on  following  pages) 
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Figure  13.   (continued) 
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to  the  intermediate  language.  Of  course  the  computer, 
through  tne  actions  of  the  compiler,  is  much  better  eguipped 
to  cope  with  this  than  the  human  programmer  tryinq  to  write 
this  program  in  assembly  language. 

The  postfix  Polish  code  is  often  referred  to  as  zero- 
address  code  since  the  operator  instructions  are  intended  to 
manipulate  values  on  the  top  of  a  push-down  stack  ano  thus 
do  not  contain  an  address  field.  The  method  generally  used 
to  generate  machine  code  from  this  zero-address  code  is  to 
simulate  a  mythical  stack  machine  in  the  compiling  process. 
This  "meta-execution"  stack  of  course  does  not  usually  con- 
tain values,  since  most  of  the  variables  in  a  program  have 
values  assigned  at  execution  time  rather  than  at  compile 
time,  but  rather  it  contains  information  about  the  program 
symools.  This  type  of  code  generator  is  fairly  simple  to 
implement,  especially  if  optimization  is  not  too  important, 
and  should  be  fairly  easy  to  adapt  to  a  table-driven  scheme 
(see  Chapter  V 1  1 1 ) . 

It  should  be  noted  that  each  "instruction"  in  the  inter- 
mediate language  consists  of  two  parts,  a  prefix  and  an 
operator  or  operand.  The  prefix  indicates  the  type  of  the 
instruction,  while  the  second  part  is  an  operator  (e.g., 
'^UL,  ADD,  TRA)  tor  an  OPR  prefix  or  an  operand  for  other 
prefixes.  The  LIN  instruction  is  used  to  transmit  line 
numbers  from  the  source  program,  and  the  DEF  instructions 
define  labels  in  the  intermediate  code  for  purposes  of 
oranching.    The   VAL   and  ADR  instructions  place  values  and 
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addresses/  resoec t i ve I y /  on  the  stack/  while  the  LIT  in- 
struction places  literal  (numeric  or  immediate  data)  values 
on  the  stack.  The  second  field  of  the  LIT  instruction  is 
presented  in  four  columns  in  Figure  13.  The  first  column 
gives  tne  aecimal  value  of  the  16-bit  literal  (range: 
0-oSbib)/  and  tne  second  and  third  columns  give  the  hexade- 
cimal values  of  the  nigh  and  low  order  bytes/  respectively. 
Ihe  fourth  column  indicates  the  two  ASCII  characters  (if 
printable)  represented  by  the  value. 

C.   OThER  REPRESENTATIONS 

The  reverse  Polish  form  is  not  the  only  one  used  for 
intermediate  representation  of  programs.  The  two  most  com- 
monly used  alternatives  are  triples  and  quadruples/  the 
former  being  equivalent  to  two-address  code  and  the  latter 
to  t hree-adaress  code.  Using  this  terminology/  the  Polish 
code  could  be  referred  to  as  "singles." 

Triples  are  more  clearly  representative  of  the  tree 
structure  of  a  program  than  zero-address  code/  since  each 
triple  has  the  form 

(ooerator/  operandi/  operandi)/ 
and  the  triples  are  linked  by  pointers  to  show  the  flow  of 
control  (either  operano  may  actually  be  a  pointer  to  another 
triple  whose  result  is  used  in  the  current  triple).  One 
difficulty  with  this  method  is  that  it  requires  more  memory 
to  represent  the  program  than  the  Polish  method/  out  Gries 
I  <?  2  J  presents  a  methoo  for  modifying  the  implementation  to 
reduce  the  memory  requirement. 
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(Juadruples  have  a  "result"  field  in  addition  to  the 
three  fields  of  the  triple  and  take  the  form 

(operator,  operandi,  operandi  result), 
where  the  fourth  field  is  either  a  temporary  variable  gen- 
erated by  the  compiler  (in  the  case  of  a  subexpression  of  an 
arithmetic  expression)  or  a  program  variable  (e.g.,  the 
variable  on  the  left-hand  side  of  an  assignment).  Some  qua- 
druples (and  triples  also)  will  reguire  only  the  operator 
and  one  operand  (e.g./  a  branch  instruction),  while  others 
will  reguire  all  fields  but  "operana^"  (e.g.,  for  the  unary 
minus  operator:  y  =  -x  ==>  (-,x,,y)).  The  difficulties  with 
quadruples  are  a  greater  memory  reguirement  than  for  the 
Polish  form  and  the  large  number  of  temporary  variables 
which  would  be  generated  for  any  significant  program. 

Many  compiler  designers  prefer  triples  or  guaduples  to 
Polish  code  U6,i0,41],  because  these  forms  are  claimed  to 
be  easier  to  manipulate  for  optimization  purposes.  This  is 
a  point  which  deserves  further  investigation;  however,  it 
should  be  noted  that  powerful  optimization  technigues  have 
been  successfully  applied  to  programs  represented  by  a  Pol- 
ish intermediate  language  1131. 
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VI.   DIGITAL  SYSTEM  DESCRIPTION 

Unce  the  source  program  has  been  translated  into  an 
intermediate  form  and  its  descriptive  information  has  been 
preserved  in  tables*  the  job  of  converting  this  information 
into  control  code  begins.  The  remaining  stages  of  the  com- 
piler require*  in  addition  to  the  information  transmitted 
from  pass  1*  detailed  knowledge  of  the  architecture  and 
instruction  set  of  the  hardware  in  order  to  accomplish  the 
code  generation  task.  Ordinarily  this  other  information 
would  be  included  in  the  remaining  stages  at  the  time  the 
comoiler  was  designed*  but  this  cannot  be  done  if  the  archi- 
tecture is  unknown  prior  to  compile  time.  Thus  this  chapter 
is  concerned  with  the  types  of  information  required  and  the 
problem  of  describing  this  information  for  varying  architec- 
tures. 

Many  languages  have  been  developed  for  describing  digi- 
tal systems*  and  two  excellent  surveys  of  these  languages 
have  been  published  [  6  *  ?  S )  •  Because  most  of  these  languages 
were  developed  d  y  individuals  or  small  groups  of  individuals 
workinq  on  specific  problems  they  have  shortcomings  which  do 
not  allow  universal  aDplication.  For  this  reason  the 
Conference  on  Digital  Hardware  Languages*  a  special  continu- 
ing conference  of  experts  in  the  computer  hardware  descrip- 
tion field*  has  been  formed  in  an  attempt  to  define  a 
language  which  can  become  a  standard  for  the  industry   [37]. 
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Although  the  main  purpose  of  such  languages  is  to  serve  as 
aids  in  desiqning  and  simulating  digital  systems*  it  should 
be  obvious  that  they  could  also  be  used  to  describe  a  system 
as  part  of  the  compilation  process. 

Since  there  are  several  levels  of  detail  which  may  be 
used  to  describe  a  digital  system  the  first  problem  is  to 
choose  the  most  appropriate  one.  Bell  and  Newell  17)  have 
defined  a  hierarchy  of  five  levels  for  description  of  com- 
puter systems:  the  circuit  level/  the  switching  circuit  lev- 
el *  the  register  transfer  (RT)  level/  the  programming  level/ 
and  the  P M S  (processor/  memory/  switch)  level.  .The  circuit 
level  is  the  lowest  level  and  is  well  established/  with  a 
notation  and  set  of  conventions  which  have  become  standard- 
ized over  many  years  of  electrical  engineering  practice. 
During  the  relatively  few  years  that  digital  electronics  has 
been  in  existence  the  switching  circuit  level  has  also  be- 
come well  established/  allowing  designers  to  avoid  much  of 
the  detail  necessary  in  describing  their  systems  at  the  cir- 
cuit level.  Thus  digital  circuits  are  designed  with  gates 
and  delays  rather  than  transistors/  diodes/  and  other  com- 
ponents of  the  circuit  level.  At  the  other  end  of  the 
scale/  the  P M S  level  (although  the  specific  term  was  coined 
by  Bell  and  Newell)  has  also  been  in  use  for  some  ti.me/ 
since  this  is  the  level  used  to  describe  the  gross  proper- 
ties of  computer  systems.  The  programming  level  has  also 
become  well  developed/  since  most  digital  systems  have  been 
the   kind  which  require  a  program  in  order  to  perform  useful 
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functions.  The  HT  level/  which  is  the  one  that  seems  most 
natural  for  conveying  the  structure  of  digital  systems  and 
interfacing  between  the  circuit  levels  and  the  proaramming 
level/  has  been  recognized  as  a  level  since  the  1950's  but 
has  only  recently  been  the  subject  of  serious  efforts  aimed 
at  formalization  [  6 ]  • 

During  the  brief  history  of  the  computer  industry  com- 
puters have  evolved  from  huge  pieces  of  hardware  with  very 
limited  capabilities  (by  today's  standards)  to  very  comDact 
units  with  very  broad/  powerful  capabilities.  For  all  of 
this  change/  though,  the  architectures  of  computer  systems 
still  closely  adhere  to  the  concents  originally  prooosed  by 
von  Neumann  (7/  ch.4).  Even  the  development  of  minicomDut- 
ers  and  microcomputers  has  not  changed  this  fact/  since  most 
of  the  same  features  which  were  successful  on  larger  comput- 
ers have  been  carried  over  into  these  smaller  systems.  In 
fact/  the  increased  competition  in  the  computer  industry 
which  has  been  caused  by  the  acceptance  of  these  new  types 
of  computers  will  probably  have  the  effect  of  "weeding  out" 
features  which  are  not  well  conceived  or  introduced  merely 
for  uniqueness  and  of  more  or  less  standardizing  features 
which  prove  useful  across  a  wide  range  of  applications. 

No  attempt  will  be  made  here  to  describe  all  of  the 
variations  in  architecture  which  have  evolved  over  the 
years*  since  a  comprehensive  survey  has  been  presented  by 
Bell  and  Newell  [7],  Suffice  it  to  say  that/  while  there 
are  many  differences  among  the  various  types  of   systems   at 
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the  switching  circuit  level/  there  are  many  similarities  at 
the  RT  level.  The  features  which  distinguish  one  system 
from  another  at  this  level  can  be  grouped  according  to  a  few 
system  characteristics. 

A.   BASIC  CHARACTERISTICS 

The  two  main  PMS  level  building  blocks  in  a  conventional 

system  are  the  central  processing  unit  (CPU)  and  the  memory. 

In  most  systems  the  majority  of  the  instructions  are  devoted 

to   transferring   data  between  these  two  units.   In  order  to 

represent  these  ideas  at  the  RT  level  Barbacci  [6]  refers  to 

the  basic  components  as  "operators"  and  "carriers." 

Ooerat  ors  are  entities  that  produce  information  by 
transformation  of  bit  patterns  to  which  meaning  has  been 
assigned.  These  bit  patterns  reside  in  carriers r  which 
are  the  entities  used  in  storing  and  transmitting  the 
information  to  and  from  the  operators.   [6/  p. 139] 

The  operators  can  be  seen  to  represent  the  functions  per- 
formed by  the  CPU  in  the  classical  digital  system,  and  the 
carriers  are  hardware  memory  elements  with  various  degrees 
of  latency.  Wires  and  busses  can  be  considered  short  term 
memories,  while  registers  and  core  memories  carry  informa- 
tion for  longer  intervals  of  time. 

The  basic  characteristics  of  a  digital  system  can  thus 
be  described  by  defining  the  various  carriers  and  operators 
of  which  it  is  constructed.  Barbacci  considers  the  register 
to  be  the  basic  unit  in  defining  carriers/  and  its  descrip- 
tors are  its  name,  dimensions,  and  size  of  alphabet  (typi- 
cally 2,  as  in  binary  systems).  All  types  of  memory  in  a 
system    can    be   constructed   from   registers   by   forming 
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subregi s t ers ,  compound  registers*  and  arrays.  An  example  of 
subregisters  in  a  tyoical  system  is  the  partition  of  the 
instruct. ion  register  into  an  ooeration  code  field  and  one  or 
more  operand  fields.  The  Intel  8080  microprocessor  provides 
examples  of  compound  registers?  e.g./  the  H  and  L  registers 
are  considered  as  separate  registers  in  several  instruc- 
tions/ but  when  concatenated  they  form  the  address  register. 
The  primary  memory  of  a  typical  comouter  can  be  thought  of 
as  an  array  of  registers. 

The  two  most  primitive  kinds  of  operators  in  a  digital 
system  are  the  ones  usually  represented  in  a  high-level 
language:  logical  (negate/  inclusive  or/  exclusive  or/  and/ 
equivalence)  and  arithmetic  (addition/  subtraction/  mutipli- 
cation/  division).  Other  operators  needed  in  describing  the 
system  include  vector  operators  for  manipulating  registers 
(shift/  rotate)/  transfers/  concatenation/  and  special 
ooerators  (counting/  exchanging/  etc.). 

It  is  necessary  but  not  sufficient  for  a  compiler  to 
have  information  about  the  operators  and  carriers  of  a  com- 
puter system.  Since  the  primary  purpose  of  the  compiler  is 
to  generate  control  code  for  the  computer/  information 
describing  the  instruction  set  is  also  required.  In  essence 
the  compiler  performs  a  mappinq  from  intermediate  language 
code  to  machine  code/  and  it  is  necessary  to  provide  suffi- 
ciently detailed  information  to  carry  out  this  mapping.  The 
level  of  detail  will  be  much  greater  if  the  intermediate 
language  is  to  be  translated  into  microcode  than  if  it  is  to 
be  translated  into  a  conventional  machine  language.   The  bit 
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patterns  of  all  of  the  machine  instructions  are  required  in 
order  to  produce  the  actual  machine  coder  and  the  mnemonics 
of  the  instructions  (in  an  assembly  language  type  of  format) 
may  be  required  in  order  to  produce  a  version  of  the  machine 
code  which  can  easily  be  read  by  the  programmers  who  will  be 
trying  to  debug  the  programs  produced.  Special  types  of 
instructions  (subroutine  jumps  and  returns*  interrupt  pro- 
ducing and  handling/  i nout /out  put )  must  be  accounted  for* 
and  any  side  effects  of  instructions  (such  as  the  setting  of 
condition  codes)  must  be  described.  Methods  of  addressing 
will  have  an  effect  on  the  code  produced  and  will  also  have 
to  be  described.  For  some  machines  it  is  necessary  to 
transfer  operands  from  main  memory  to  registers  in  order  to 
operate  on  them,  while  other  machines  have  instructions 
which  operate  on  operands  directly  in  main  memory.  Many 
machines  have  both  types  of  instructions.  Indirect  address- 
ing* indexing*  and  stack  manipulation  are  important  features 
which  also  must  be  described. 

The  amount  of  detail  provided  about  the  instruction  set 
and  the  physical  characteristics  of  the  machine  has  a  direct 
bearing  on  the  capability  of  the  compiler  to  produce  "good" 
machine  code.  If  sufficient  information  is  available 
machine-dependent  optimizations  can  be  performed  on  the  code 
as  it  is  being  generated*  as  discussed  in  Section  VII. A. 1. 

B.   MICROPROGRAMMING  AND  MODULARITY 

Two  basic  classes  of  systems  can  be  distinguished*  the 
first    being    the    conventional    or    classical     type* 
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characterized  by  a  fixed  or  hardwired  instruction  set. 
Microproqrammable  systems*  in  which  the  instruction  set  (in 
the  usual  sense  of  the  term)  is  able  to  be  changed  by  alter- 
ing a  memory,  form  the  second  class.  Included  in  the  latter 
class  are  modular  systems*  which  in  addition  to  having  a 
variaole  instruction  set  have  a  variable  machine  organiza- 
tion. Both  classes  of  systems  require  the  basic  types  of 
information  discussed  above  to  be  transmitted  to  the  com- 
piler. The  additional  types  of  information  required  by  the 
compiler  for  microproqrammable  systems  are  discussed  below. 

Microproqrammable  systems  have  been  growing  rapidly  in 
use  in  the  last  few  years*  but  for  all  of  the  special  atten- 
tion which  has  been  devoted  to  it*  microprogramming  is  not 
significantly  different  from  "regular"  programming.  Reigel 
and  Lawson  have  defined  mi c rogramm i ng  as  M...  a  technique 
for  implementing  the  control  function  of  a  digital  computing 
system  as  sequences  of  control  signals  that  are  organized  on 
a  word  basis  and  stored  in  a  memory."  (48*  p . 2  ]  There  is 
nothing  in  this  definition  which  does  not  apply  equally  well 
to  a  non-mi c roorogrammed  computer.  Eckhouse  has  noted  that* 
with  respect  to  microprogrammable  hardware*  "...  all  of  the 
machines  can  be  classified  as  classical*  von  Neumann  in 
nature  with  only  minor  perturbations."  [18*  p. 172] 

What  microoroqramming  has  done  is  allow  increased  flexi- 
bility in  digital  system  design  by  providing  the  designer 
with  greater  access  to  the  hardware.  IBM  was  the  first  com- 
pany to  successfully  apply  mi c rogramm i ng  when  it  produced 
the  S/3b0  family  of  computers.   The  designers  were  able  "... 
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to  achieve  a  range  of  compatible  processors  offering  the 
same  large  machine  instruction  set  at  many  different  levels 
of  performance."  lb  f  p. 31  It  is  interesting  to  note  that/  in 
a  sense/  the  common  machine  code  of  this  series  could  be 
considered  a  machine-independent  programming  language. 

Perhaps  the  attraction  of  mi c roorogramm i ng  can  best  be 
appreciated  by  considering  the  distinction  which  Barbacci 
makes  between  architecture  and  machine  organization.  He 
considers  the  architecture  to  be  the  behavioral  description 
of  a  system;  i.e./  what  the  programmer  perceives  the  system 
to  be.  On  the  other  hand/  the  machine  organization  is  "... 
the  particular  combination  of  registers/  busses/  combina- 
tional networks/  and  control  ..."  in  a  system.   [6/  p . 1 4 4 ] 

The  architecture  influences  the  machine  organization  by 
imposing  a  set  of  reguirements  (a  particular  instruction 
set)  and  the  organization/  mainly  for  technological  rea- 
sons/ influences  the  architecture  of  the  machine.  The 
result  is  usually  that  a  given  computer  architecture  can 
be  implemented  on  a  set  of  machine  organizations/  and  a 
given  organization  accepts  several  architectures. 
Cb/  p. 144] 

Thus  in  a  conventional  computer  system  it  is  necessary  to 
describe  only  the  architecture  cf  the  machine.  The  instruc- 
tion set  serves  as  a  level  of  abstraction  separating  the 
programmer  (or  compiler)  from  the  machine  organization.  In 
a  microprogrammed  system  another  instruction  set  may  be 
defined  in  order  to  preserve  this  abstraction/  but  if  the 
programmer  is  to  work  in  a  high-level  language  it  may  be 
possible  (but  not  necessarily  desirable)  to  skip  this  step 
by  allowing  the  compiler  to  interface  directly  with  the 
machine  organization. 
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The  major  hurdle  in  microprogramming  is  the  introduction 
of  timing  considerations  to  the  list  of  information  re- 
quired. In  describing  the  basic  information  required  by  a 
compiler/  no  mention  was  made  in  Section  VI. A  of  the  timing 
of  instructions.  The  reason  for  this  is  that  conventional 
systems  usually  operate  in  a  seguential  fashion*  with  tHe 
execution  of  one  instruction  not  beginning  until  the  previ- 
ous instruction  has  been  executed.  Even  though  many  events 
may  be  occurring  in  parallel/  this  is  hidden  by  the  instruc- 
tion set.  One  of  the  values  of  microprogramming  is  that  it 
allows  the  programmer  to  specify  the  concurrency  of  certain 
events.  Unfortunately  this  additional  flexibility  is  ob- 
tained by  increasing  the  amount  of  detail  with  which  the 
programmer  must  cope.  This  is  why  instruction  sets  similar 
to  those  of  conventional  computers  are  usually  defined  for 
mi c roprogrammab 1 e  systems.  For  example/  the  Intel  3000 
series  of  microprocessor  chips  has  been  advertised  as  a 
mi c roprogrammab 1 e  microprocessor;  however/  the  series  has 
not  yet  been  exploited  to  its  fullest  potential  because 
Intel  is  still  in  the  process  of  defining  a  hiqher-level 
instruction  set  comparable  to  the  typical  machine  lanauage. 
More  of  the  considerations  involved  in  working  with  con- 
current systems  are  presented  below  in  Section  V I . C . 

Another  trend  which  is  occurring  in  the  digital  elec- 
tronics field  is  the  development  of  modular  systems 
(lb, 41/56).  The  reasons  for  development  of  such  systems  are 
very  similar   to   the   reasons   for   some   of   the   software 
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engineering  concepts  (see  Section  I.B).  In  fact/  modularity 
(of  programs)  is  one  of  the  principles  of  software  engineer- 
ing. In  addition  to  the  variable  architecture  characteris- 
tic of  other  mi c roDrogrammab 1 e  systems*  modular  systems  have 
variable  machine  organizations.  Thus  the  task  of  program- 
ming these  systems  is  even  more  difficult. 

If  development  of  different  types  of  components  can  be 
reduced*  and  if  standardization  of  modules/  test  pro- 
cedures/ and  logistic  support  can  be  achieved/  the  life- 
cycle  cost  of  systems  can  be  greatly  reduced.  One 
approach  to  implementation  of  this  idea  is  to  identify  a 
level  of  modularity  for  components  which  can  have  wide 
application  in  many  types  of  systems.  This  allows  the 
development  cost  of  the  modules  to  be  spread  over  many 
units/  while  reducing  the  guantity  of  components  and  the 
logistics  costs.  [56/  p. 3] 
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Figure  14.   Example  of  a  three-bus 
modular  system  using  QED  modules 
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One  of  many  possible  configurations  for  a  modular  system 
is  shown  in  Figure  14  [56/  p. 19] .  The  control  module  in 
this  type  of  system  would  probaoly  consist  of  a  read-only 
memory  programmed  to  initiate  all  actions  requi  red  by  the 
system  performance  specifications.  This  type  of  system 
would  be  very  similar  to  a  mi c roorogrammabl e  computer  in  its 
operation  and  control  code  structure.  Thus  the  main  problem 
in  programming  such  a  system  will  be  to  take  maximum  advan- 
tage of  parallelism.  The  designer  will  also  have  the  prob- 
lem of  selecting  the  most  appropriate  types  and  numbers  of 
modules  to  use  and  the  problem  of  choosing  the  most  effi- 
cient organization  (i.e.*  the  number  of  busses  and  the  con- 
nections of  modules  to  busses). 

An  alternative  type  of  modular  system  would  spread  the 
control  function  among  all  of  the  modules  rather  than  con- 
solidating it  into  a  single  module.  Such  a  system  would  not 
be  program  controlled  but  would  accomplish  its  t'asks  by  hav- 
ing the  various  modules  communicate  with  one  another  by 
means  of  "reaoy"  and  "acknowledge"  sianals.  Because  such 
modules  would  not  be  as  flexible  as  the  type  shown  in  Figure 
l4r  they  probably  will  not  be  as  widely  used. 

C.   PARALLELISM 

The  raoid  growth  of  the  computer  industry  has  been 
spurred  by  the  ever-increasinq  speed  of  computer  hardware 
brought  about  by  continuing  advances  in  the  electronic  com- 
Donent  industries.  Vacuum  tubes  gave  way  to  transistors  in 
the  late  1950' s   and   early   1 9 6 0 ' s t       and   the   latter   were 
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supplanted  Dy  integrated  circuits  in  the  mid-1960's.  These 
small-scale  integrated  (SSI)  circuits  were  soon  antiguated 
by  medi.um-sca  1  e  integration  (MSI)  technology*  and  direct 
gate-level  and  register-level  design  (without  considering 
the  c i rcui t - 1  eve  1 )  became  possible.  Today  large-scale  in- 
tegrated (LSI)  circuitry  is  available  in  large  guantities* 
and  whole  subsystems  can  be  manufactured  on  one  small  chip 
of  silicon.  while  these  tremendous  increases  in  circuit 
density  have  played  an  important  part  in  increasing  the 
speed  of  digital  systems*  they  have  been  accompanied  by 
advances  in  the  state-of-the-art  in  semiconductor  manufac- 
ture which  have  allowed  much  faster  switching  times  to  be 
achieved. 

Unfortunately*  there  are  physical  limits  to  the 
processes  which  have  brought  about  these  vast  changes*  and 
the  semiconductor  industry  will  soon  be  nearing  them.  As  a 
result*  increases  in  computer  circuit  speed  will  be  coming 
at  a  much  slower  rate  than  in  the  past  (unless  some  new 
technology  is  discovered  which  does  not  depend  on  the  motion 
and  storage  of  electrons).  Thus  any  further  major  advances 
in  computer  speed  will  have  to  rely  on  increased  use  of 
advanced  machine  organization  techniques  which  take  advan- 
tage of  features  such  as  parallelism  and  pipelining.  Paral- 
lelism refers  to  the  concurrent  performance  of  multiple 
tasks  in  a  system*  where  a  task  is  "...  a  self-contained 
portion  of  a  computation  (or  some  other  computer  operation] 
that   once   initiated   can   be  carried  out  to  its  completion 
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without  need  for  additional  inputs."  l^b,  p. 98b]  Pipelining 
is  accomplished  by  dividing  a  task  into  many  independent 
subtasks  such  that  a  new  process  can  begin  the  task  as  soon 
as  the  previous  one  has  completed  the  first  subtask.  In 
this  way  many  orocesses  can  be  performing  the  same  task, 
each  being  in  a  different  stage  of  completion. 

Another  factor  which  has  resulted  in  the  increased  use 
of  parallelism  in  digital  systems  is  the  growing  emphasis  on 
modularity  ana  microprogramming  discussed  in  Section  VI. R. 
Modularity  in  hardware  may  be  looked  at  from  several 
viewpoints/  from  small  modules  such  as  registers. and  busses 
considered  in  microprogramming  to  large  functional  modules 
as  discussed  by  Tinklepaugh  and  Eddington  [56].  This  wide 
diversity  means  that  there  are  several  levels  of  parallelism 
which  must  be  considered/  each  with  its  own  unique  problems 
and  methods  of  solution  [4b]. 

The  fact  that  parallelism  is  a  significant  consideration 
in  the  design  of  a  digital  system  is  highlighted  by  the  fact 
that  at  least  one  entire  book  has  been  devoted  to  the  sub- 
ject 139],  Several  high-level  language  comoilers  have  been 
developed  or  prooosed  for  use  with  the  new  generation  of 
array  processors*  which  rely  heavily  on  the  use  of  parallel- 
ism at  the  instruction  and  arithmetic  expression  levels 
135).  Ramamoorthy,  Park,  and  Lee  [46]  present  a  good  over- 
view of  some  of  the  factors  involved  in  working  with  paral- 
lelism and  present  several  algorithms  for  taking  advantage 
of  it  at  the  arithmetic  expression  and  subexpression  levels. 


98 


There  are  two  basic  ways  in  which  parallelism  may  be 
handled  in  a  high-level  1 anguage--exo1 i c i t 1 y  or  implicitly. 
In  the  explicit  approach  there  are  special  instructions 
included  in  the  language  (e.g./  FORK  and  JOIN)  by  which  the 
programmer  may  indicate  sections  of  code  which  may  be  exe- 
cuted in  parallel. 

The  exDlicit  approach  is  advantageous  for  the  recognition 
and  representation  of  parallelism  between  blocks  of 
instructions  or  between  instructions/  since  the  analysis 
of  parallelism  between  tasks  at  these  levels  is  si  mole, 
however/  the  explicit  approach  is  not  advantageous  for 
recognizing  and  representing  parallelism  at  arithmetic 
operation  (subexpression)  or  micro-step  level/  because  it 
is  tedious  and  mistake  prone.  [46/  d.986) 

The  implicit  approach  places  the  burden  on  the  compiler  by 
incorporating  two  new  steps  in  the  compilation  process.  The 
first  steo  involves  the  recognition  of  parallel  processable 
tasks/  and  the  second  involves  representation  of  the  infor- 
mation obtained  in  the  first  step  and  allocation  of 
resources  in  such  a  manner  that  maximum  advantage  is  taken 
of  the  parallelism.  "This  approach  involves  considerable 
overhead  to  recognize  parallel  tasks  in  a  program  although 
it  relieves  programmers  of  additional  duties."  [46  p. 9861 

Thus/  as  is  usually  the  case  in  design  work/  there  are 
tradeoffs  involved  in  determining  whether  to  use  the  expli- 
cit or  the  implicit  approach  t o. pa ra 1 1 e 1 i sm .  At  the  current 
state-of-the-art  in  compiler  design  it  is  probably  desirable 
to  use  some  combination  of  the  two.  The  explicit  approach 
can  be  used  for  blocks  of  instructions  (e.g./  subroutines)/ 
and  tne  implicit  aoproach  can  be  used  at  the  arithmetic 
expression   level  to  reduce  the  number  of  programming  errors 
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which  would  result  from  using  the  explicit  aoproach  at   this 
1  evel . 

In  either  approach  the  compiler  can  be  given  the  job  of 
allocating  the  resources  for  execution  of  the  program.  The 
major  problem  in  this  area  involves  the  proper  synchroniza- 
tion of  the  various  pieces  of  hardware.  Again  there  appear 
to  be  two  possible  methods  for  approaching  this  problem. 
One  way  would  require  that  the  description  of  each  module 
contain  information  about  the  maximum  time  required  for  the 
module  to  perform  a  given  function.  Then  once  the  compiler 
had  assigned  a  task  to  a  particular  module  it  would  have  to 
allow  the  specified  amount  of  time  before  it  could  issue  an 
instruction  requiring  the  results  of  that  module  to  be 
available.  This  approach  has  the  disadvantage  that  it  would 
not  allow  the  hardware  to  operate  at  maximum  speed*  since 
many  functional  modules  (e.g.>  multipliers)  have  a  wide 
variation  in  speed  depending  on  the  input  data.  The  other 
approach  is  the  one  used  in  modern  operating  systems  for 
sophisticated  computers.  In  essence  this  would  involve  set- 
ting up  a  small  operating  system  which  would  perform  the 
resource  allocation  at  execution  time  rather  than  compile 
time.  Synchronization  would  be  accomplished  by  sending  con- 
trol signals  to  the  various  modules  and  receiving  signals 
from  the  modules  to  indicate  task  completion.  Obviously 
this  method  involves  a  fairly  significant  amount  of  overhead 
in  the  form  of  additional  memory  required  to  hold  the 
operating  system.   Thus  the  programmer   would   have   another 
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tradeoff   to   make  in  determining  which  method  would  be  most 
appropriate  for  his  application. 

As  in  the  other  sections  of  this  chapter/  many  ideas 
have  been  presented  in  this  section.  No  specific  recommen- 
dations have  been  made  as  to  which  of  them  may  be  applicable 
to  a  compiler  for  user-definable  arc h i t ec t ures ,  since  the 
determination  of  such  recommendations  will  require  a  consid- 
erable amount  of  additional  research  and  experimentation. 
The  intent  here  has  been  to  exhioit  a  (not  necessarily  all- 
inclusive)  list  of  some  of  the  things  which  must  be  con- 
sidered in  implementing  "pass  2"  of  such  a  compi.ler. 
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VII.   COMPILER  OPTIMIZATION 

Optimization  is  a  frequently  pursued  goal  in  the  design 
of  engineering  systems/  whether  they  be  hardware  systems  or 
software  systems.  The  mathematical  solution  of  an  optimiza- 
tion problem  requires  finding  the  minimum  of  a  cost  function 
(or  maximum  of  a  reward  function)  while  satisfying  a  set  of 
constraints.  Unfortunately  the  equations  involved  are  often 
nonlinear/  making  a  closed-form  solution  impossible. 
Attempts  at  solution  by  enumerating  all  the  possibilities 
are  usually  not  practical  for  nontrivial  problems/  because 
the  enumeration  expands  in  a  combinatorial  manner.  (In 
optimal  control  theory  this  problem  has  sometimes  been  re- 
ferred to  as  the  "curse  of  dimensionality.")  Thus/  though  a 
large  body  of  theory  has  been  developed  to  deal  with  optimi- 
zation problems/  often  the  only  practical  solution  to  a 
problem  involves  the  use  of  ad  hoc  methods.  Such  has  been 
the  case  to  a  large  extent  in  dealing  with  the  problem  of 
code  optimization. 

Another  significant  barrier  to  the  application  of  oood 
optimization  techniques  is  the  general  difficulty  of  speci- 
fying what  constitutes  an  optimal  solution  to  a  given  design 
problem.  In  fact  it  has  been  noted  by  Aho  and  Ullman/  with 
respect  to  the  code  generated  by  a  compiler/ 

...  that  there  is  no  alqorithmic  way  to  find  the  shortest 
or  fastest-running  program  equivalent  to  a  given  program. 
...   Thus  the  term  optimization  is  a  complete  misnomer--in 
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Dractice  we  must  be  content  with  code  improvement.  Vari- 
ous code  improvement  techniques  can  be  employed  at  various 
phases  of  the  compilation  process.   [3/  p. 70-71] 

It   is  the  ourpose  of  this  chapter  to  discuss  the  motivation 

for  research  in  the  area  of   compiler   optimization   and   to 

examine  some  of  the  formal  techniques  which  may  prove  useful 

in  implementing  compilers  for  firmware  design  languages. 

A.   MOTIVATION 

As  far  back  as  the  early  1950's,  when  the  FORTRAN  I  com- 
piler was  being  designed?  it  was  recognized  that  convenience 
alone  was  not  enough  to  persuade  programmers  to  use  high- 
level  languages  152].  Unless  the  compiler  could  produce 
machine  coae  which  was  comparable  in  efficiency  to  hand- 
coded  programs  there  would  be  a  great  deal  of  resistance  to 
the  use  of  high-level  languages.  In  the  intervening  years 
computer  architectures  and  instruction  sets  have  increased 
in  complexity/  making  it  even  more  difficult  for  a  compiler 
to  match  a  good  assembly  language  programmer. 

Three  computer  hardware  trends  which  have  developed  over 
the  years  are  the  increase  in  speed/  the  increase  in  main 
memory  size/  and  the  increase  in  size  and  power  of  the  in- 
struction set.  These  trends  have  had  the  effect  of  reducing 
the  need  for  optimization  in  compilers/  since  for  many  ap- 
plications the  hardware  efficiency  more  than  offset  the 
compiler-generated  code  inefficiency.  In  recent  years  there 
has  been  yet  another  trend--the  acceptance  of  minicomputers/ 
and  now  m i c rocomout e rs /  as  components  in  the  desion  of 
larger  systems.   In  such  applications  the  cycle  is  beginning 


103 


to  repeat *  since  these  smaller  computers  typically  have  slow 
execution  times*  a  small  amount  of  memory*  and  a  relatively 
1 i m i t ed  .number  of  instructions.  Thus  the  programs  written 
for  these  devices  must  be  as  efficient  as  possible  in  order 
to  minimize  the  amount  of  hardware  used.  Even  in  sophisti- 
cated microprogram  mable  systems*  though*  the  amount  of 
hardware  used  may  be  critical  in  determining  the  profitabil- 
ity of  a  given  design.  As  a  consequence;  code  optimization 
is  becoming  increasingly  important  in  order  to  allow 
firmware  designers  to  take  advantage  of  all  the  benefits  of 
high-level  language  programming. 

An  example  of  the  kinds  of  inefficiencies  involved  is 
shown  in  Figures  15-17.  Figure  15  shows  a  PL/M  program  for 
performing  a  simple  bubble  sort*  while  Figure  16  shows  a 
hand-coded  Intel  8080  assembly  language  version  of  the  same 
program  (43J.  Note  that  neither  of  these  programs  would 
actually  be  run  oy  itself  but  would  probably  be  a  procedure 
in  a  larger  program  (in  which  ARRAY  and  N  would  be  given 
values).  The  purpose  here  is  to  examine  the  code  generated 
for  the  sorting  algorithm  without  getting  involved  in  the 
various  issues  of  subroutine  linkage. 

Figure  17  shows  the  output  (reformatted  by  the  author 
for  ease  of  comparison)  of  the  Intel  8080  PL/M  compiler, 
version  1.0.  Not  counting  storage  space  for  the  variables* 
the  hand-coded  version  requires  40  bytes  of  storage*  and  the 
compiler  version  requires  116  bytes— a  relative  inefficiency 
of  190  percent.   A  similar  but  somewhat   larger   version   of 
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1. 
2. 

3. 

a. 

5. 

6. 

7. 

8. 

9. 
10. 
11. 
12. 
13. 
14. 
15. 
16.   EOF 


DECLARE  ARRAY(256)  BYTE, 
(N,I, II, T2, SWITCHED) 

SWITCHED  =  l; 

DO  WHILE  SWITCHED; 
SWITCHED  =  o; 


BYTE; 


DO  I  =  1  TO  N  -  1 
Tl  r  ARRAY(I);  T2 
IF  Tl  >  T2  THEN 

do; 

ARRAY(Ifl)  =  Tl; 
ARRAY(I)  =  T2; 
SWITCHED  =  l; 
END; 
END; 


=  ARRAYU  +  1) 


END; 


Figure  15.   PL/M  bubble  sort  program 


1. 

MVI 

0,1 

2. 

LI: 

MOV 

A,D 

3. 

ADI 

0 

a. 

JZ 

L2 

5. 

MVI 

D,0 

6. 

MVI 

H,N.H 

7. 

MVI 

L,N.L 

8. 

MOV 

B,M 

9. 

MVI 

H, ARRAY. H 

10. 

MVI 

L, ARRAY. L 

11. 

L3: 

DCR 

B 

12. 

JZ 

LI 

13. 

MOV 

A,M 

la. 

INR 

L 

15. 

CMP 

M 

16. 

JP 

L3 

17. 

MOV 

CM 

18. 

MOV 

M,A 

19. 

DCR 

L 

20. 

MOV 

M,C 

21. 

INR 

L 

d2. 

MVI 

D,l 

23. 

J  MP 

L3 

2a. 

L2: 

HL1 

Fiqure  16.   Hand-coded  8080  assembly  language 
version  of  oubble  sort  program  (after  Pooper  ta3]  ) 
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1.  LXI  SP,00F4H 

2.  LXI  H, SWITCHED 

3.  MVI  M,l 

a.  Ll:  LXI  H, SWITCHED 

5.  MOV  A,M 

6.  RRC 

7.  JNC  La 

8.  MVI  M,0 

9 .  MVI  L , I . L 

10.  MVI  M,l 

11.  L2:  LXI  H,N 

12.  mov  CM 

13.  DCR  C 
la.  MOV  A,C 
lb.  INR  L 

16.  SOB  M 

17.  JC  Ll 

18.  LHLO  1 

19.  MVI  H,0 

20.  LXI  D, ARRAY 

21.  DAD  D 

22.  MOV  A,M 

23.  LXI  H,T1 
2a.  MOV  M,A 

25.  DCR  L 

26.  MOV  CM 

27.  INR  C 

28.  MOV  CC 

29.  MVI  H,0 

30.  LXI  D, ARRAY 

31.  DAD  D 

32.  MOV  A,M 

33.  LXI  H,T2 
3a.  MOV  MfA 


35. 

DCR 

L 

36. 

SOB 

M 

37. 

JNC 

L3 

38. 

DCR 

L 

39. 

MOV 

CM 

ao. 

INR 

C 

ai. 

MOV 

CC 

a2. 

MVI 

H,  0 

a3. 

LXI 

Df ARRAY 

aa. 

DAD 

D 

a5. 

XCHG 

a6. 

LXI 

H,T1 

a7. 

MOV 

CM 

a8. 

MOV 

A,C 

a9. 

STAX 

D 

SO. 

LHLD 

I 

51. 

MVI 

H,0 

52. 

LXI 

D, ARRAY 

53. 

DAD 

D 

5a. 

XCHG 

55. 

LXI 

H,T2 

56. 

MOV 

C/M 

57. 

MOV 

A,C 

58. 

STAX 

D 

59. 

INR 

L 

60. 

MVI 

M,l 

61.  L3: 

MVI 

L,I.L 

62. 

MOV 

CM 

63. 

INR 

C 

6a. 

MOV 

M,C 

65. 

JNZ 

L2 

66. 

JMP 

Ll 

67.  La: 

EI 

66. 

HLT 

Figure  17.   Reformatted  PL/M  compiler  output 
for  bubble  sort  program  of  Figure  15 
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this  program,  cited  by  Falk  [19],  was  hand-cod^d  for  the 
Intel  6008,  and  required  347  bytes  of  code.  The  initial 
version  of  the  8008  PL/M  comDiler  generated  495  bytes  of 
code  for  this  larger  program--a  relative  inefficiency  of  47 
percent.  A  later  version  of  the  8008  PL/M  compiler  (con- 
taining improved  optimization  techniques)  generated  388 
bytes  of  coder  yielding  a  12  percent  relative  inefficiency 
134]  . 

This  tends  to  confirm  the  fact  that*  in  most  applica- 
tions, comDi 1 er-produced  code  compares  more  favorably  with 
assembly  code  as  the  size  of  the  program  increases.  Also 
the  current  PL/M  compilers  use  relatively  unsophisticated 
optimization  techniques,  and  further  improvements  could  be 
obtained  with  relatively  little  additional  effort. 

Comparison  of  Figures  16  and  17  shows  that  the  bubble 
sort  program  brings  out  two  of  the  most  severe  problems  in 
comoiler  code  generat i on--t he  register  allocation  problem 
and  the  subscript  calculation  problem.  Less  significant, 
but  also  evidentr  are  the  differences  in  the  methods  of 
branching  for  the  loops  and  the  four  extra  bytes  of  code 
generated  by  the  compiler  for  all  programs  (to  set  the  stack 
pointer  at  the  beginning  and  enable  interrupts  at  the  end). 

The  assembly  language  version  takes  aavantage  of  the 
fact  that  there  are  enough  index  registers  available  on  the 
6060  to  hold  all  temporary  variables  needed  in  the  sort 
routine.  This  saves  at  least  three  bytes  of  code  (to  load 
the   address   into   the   HL  register)  each  time  one  of  these 
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variables  is  referenced  (unless  the  address  is  already  in  HL 
from  a  previous  reference). 

The  greatest  saving  in  the  program  of  Figure  16  results 
from  the  use  of  the  HL  register  as  a  pointer  into  the  array 
being  sorted.  The  programmer  realized  that  the  elements  of 
the  arrav  being  referenced  at  any  time  were  always  adjacent 
to  one  another*  and  he  stepped  through  the  comparisons  and 
swaps  by  appropriately  incrementing  and  decrementing  the 
address  register.  The  current  compiler  is  not  capable  of 
making  this  optimization  and  so  recomputes  subscripts  for 
each  variable  reference. 

An  attempt  was  made  to  rewrite  the  PL/M  program  to  more 
closely  match  the  structure  of  the  assembly  language  version 
(see  Figure  18).  In  the  new  program  the  iterative  loop  was 
replaced  with  a  WHILE  loop*  and  the  swapping  process  was 
modified  so  that  it  would  use  only  one  temporary  variable. 
Unfortunately  the  savings  produced  by  these  changes  were 
offset  by  the  computation  of  one  additional  subscript  r  and 
the  new  orogram  generated  as  much  code  as  the  old. 

As  mentioned  in  Chapter  11/  one  of  the  advantages  of 
programming  in  a  high-level  language  is  the  ease  with  which 
changes  can  be  made.  Figure  19  shows  the  minor  PL/M  program 
changes  (to  the  declaration  statements)  which  would  be  re- 
quired if  the  array  to  be  sorted  contained  more  than  256 
values  and  if  the  values  were  double-byte  (address)  rather 
than  single-byte.  Two  other  changes  have  been  indicated  in 
order  to  make  the  program  technically   correct.    Of   course 
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1. 

2. 
3. 

a. 

5. 
6. 
7. 

6. 

9. 
10. 
11. 
12. 
13. 
14. 

15. 

16. 
17. 


DECLARE  ARRAYC256)  BYTE, 

CSWITCHED,N, I,TEMP)  BYTE; 
SWITCHED  =  l; 

DO  WHILE  SWITCHED; 
SWITCHED  =  0; 

I  =  n; 

DO  WHILE  (I  :=  I  -  1); 

IF  ARRAY(I)  >  ARRAYCI+1)  THEN 

DO; 

TEMP  r  ARRAY(I); 

ARRAY(I)  =  AKRAYCI+1); 

ARRAY(Itl)  =  TEMP; 

SWITCHED  =  l; 

end; 


end; 


end; 


EOF 


Fiqure  18.   PL/M  program  revised  to  match 
the  control  structures  of  the  assembly  language  version 


1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 


declare  arrayc256)  address, 
(n, i, temp)  address, 
switched  byte; 

SWITCHED  =  l; 

DO  WHILE  SWITCHED; 

switched  =  o; 
I  =  n  -  i; 

do  while  (i  :=  i  -  l)  +  l; 

IF  ARRAY(I)  >  ARRAYU  +  1)  THEN 

DO; 

TEMP  =  ARRAY(I); 
ARRAY(I)  =  AKRAYCI+1); 
ARRAYCI+1)  =  TEMP; 
SWITCHED  =  l; 
END,* 


end; 


end; 


EOF 


Figure  19.   Modified  PL/M  program 
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tne  changes  caused  much  more  code  to  be  generated  (197  bytes 
as  opposed  to  116)  in  order  to  handle  the  double-byte  arith- 
metic and  data  transfer  operations.  The  interesting  point 
here  is  that  an  assembly  language  version*  if  one  had  been 
written  for  this  new  problem/  would  certainly  be  much  longer 
than  40  bytes  since  there  would  be  insufficient  registers 
available  to  hold  all  of  the  temporary  values.  Also  the 
compiled  version  would  have  a  much  lower  relative  ineffi- 
ciency in  relation  to  such  a  hand-coded  version  of  the  new 
program.  The  amount  of  effort  reguired  to  change  the  pro- 
gram would  obviously  have  been  many  times  greater  than  was 
the  case  for  the  PL/M  version. 

B.   TECHNIQUES 

In  his  excellent  compiler  optimization  survey  Schneck 
152]  has  classified  optimization  technioues  into  three  func- 
tional categories  based  upon  the  amount  of  knowledge  they 
require  about  the  object  machine.  He  calls  the  three  ca- 
tegories machine-dependent/  architecture-dependent/  and 
arc h i t ec t ure- i ndepenoent .  Some  of  the  more  important  tech- 
niques in  each  category  are  highlighted  below  in  order  to 
show  that  many  of  the  inefficiencies  usually  associated  with 
compiler-generated  code  can  be  eliminated  if  careful  atten- 
tion is  paid  to  optimization. 

1 .   Machinp-Qenendent 

Machine-dependent  optimizations  are  also  classified 
as  local  optimizations  since  they  are  applied  to  short  soans 
of  code  during  the  code  generation  process  rather  than  prior 
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to  code  generation  as  indicated  in  Figure  4.  Thus  these 
techniques  require  a  detailed  knowledge  of  the  instruction 
set  of  the  object  machine.  For  example/  if  the  operation  to 
be  performed  is  an  addition  and  one  of  the  operands  is  known 
at  compile  time  to  have  a  value  of  oner  the  code  generated 
would  be  an  increment  instruction  if  one  were  available. 

The  majority  of  the  optimizations  in  pass  2  of  the 
Intel  PL/M  compilers  fall  into  this  category.  The  results 
of  some  of  the  more  subtle  ones  can  be  seen  in  Figure  17. 
It  should  be  noted  that  the  M  V  I  instruction  has  been  used 
whenever  possible  to  perform  data  transfers.  This  instruc- 
tion reguires  two  bytes  of  memory  rather  than  the  three 
reguired  by  the  LXI  instruction,  which  could  also  have  been 
used  for  this  purpose.  Also  noteworthy  is  the  use  of  the 
increment  and  decrement  instructions. 

As  an  indication  that  these  kinds  of  optimizations 
may  not  be  as  easy  to  apply  as  it  might  at  first  appear, 
consider  Figure  2  0.  This  figure  shows  the  PDP-11  machine 
code  1151  generated  for  the  two  functionally  eguivalent  sets 
of  C  language  statements  discussed  in  Section  III. 6.  It  can 
be  seen  that,  while  the  compiler  has  used  increment  and 
decrement  instructions  in  both  cases,  the  code  in  Figure 
30(b)  is  less  efficient  than  the  other,  even  though  it  has 
been  passed  through  the  optimizer  associated  with  the  C  com- 
piler. (In  fairness,  it  should  be  noted  that  the  optimizer 
is  claimed  to  be  only  experimental.)  This  points  up  the  fact 
that  machine-dependent  optimizations  tend  to  be   applied   in 


111 


i  =  ++j  -  k; 
i  =  j«  -  k; 


i  nc 
mov 
sub 
mov 
mov 
sub 
dec 
mov 


I 

j  r  rO 

k,rO 

rO,  i 

j,rO 

kr  rO 

i 

rO,  i 


(a) 


Cj  =  j 

J  -  k; 


+  l)  -  k; 
j  =  j  -  1; 


mov 
i  nc 
mov 
sub 
mov 
mov 
sub 
mov 
mov 
dec 
mov 


(b) 


It  rO 
rO 
rOr  j 

k,  rO 
rOr  i 
j  t  rO 
k,  rO 
rO,  i 
j  ,  rO 
rO 
rO,  j 


Figure  20.   PDP-11  assembly  code  for  two 

equivalent  sets  of  C  language  statements 

(a)  using  i nc rement /dec rement  feature? 

(b)  using  addition  and  subtraction 
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an  ad  hoc  mannerr  that  is/  by  testing  a  series  of  conditions 
which  would  indicate  special  cases  in  the  code  being  gen- 
erated. 

There  is  little/  if  any /  mathematical  riaor  associ- 
ated with  these  methods/  and  they  thus  are  very  similar  to 
the  kinds  of  optimizations  which  an  assembly  language  pro- 
grammer would  make.  This  is  the  major  reason  for  the  cross- 
over in  relative  efficiency  between  assembly  language  and 
high-level  language  programs  as  program  size  increases  (see 
Section  II. A).  For  the  high-level  language  the  special 
cases  must  be  foreseen  by  the  compiler  writer.  Since  he 
probably  will  overlook  some/  the  compiler  will  generate  some 
code  which  is  obviously  inefficient/  as  in  Figure  20(b). 
Nevertheless/  those  optimizations  which  can  be  applied  to 
cases  foreseen  by  the  compiler  designer  will  be  applied  con- 
sistently by  the  compiler  every  time  the  appropriate  condi- 
tions are  satisfied.  The  assembly  language  programmer/  on 
the  other  hand/  will  easily  spot  the  kinds  of  inefficiencies 
shown  in  the  example  (and  in  the  example  of  Figures  15-17)/ 
but  he  may  not  be  consistent  in  applying  optimizations  and 
may  not  recognize  others  because  of  the  complexity  of  the 
program.  The  inefficiencies  contributed  by  these  two  fac- 
tors tend  to  ouild  up  rapidly  as  the  assembly  language  pro- 
gram increases  in  size. 

2 .   Archi  tecture-Dependent 

Architecture-dependent  optimizations  are  global  in 
nature   and  depend  on  the  architecture  of  the  object  machine 
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but  not  the  instruction  set.  Examples  of  architectural 
features  which  are  considered  are  the  number  of  registers/ 
the  number  of  processing  elements/  and  the  degree  of  pipe- 
lining. As  can  oe  seen,  these  types  of  optimizations  gen- 
erally involve  resource  allocation  and  thus  would  be  very 
important  in  compiling  for  microprogram m able  and  modular 
systems.  The  reason  these  are  considered  to  be  global  op- 
timizations is  that  the  resource  requirements  of  diverse 
segments  of  a  program  must  be  considered  when  making  the 
a  1 1 ocat  i  ons . 

The  important  register-allocation  problem  fits  into 
this  category,  since  its  solution  depends  on  the  numbers  and 
types  of  registers  available  in  the  architcture.  The  par- 
ticular machine  instructions  are  not  important  in  this  case. 
It  should  be  recalled  that  poor  register  allocation  was  one 
of  the  major  causes  of  inefficiency  in  the  code  of  Figure 
17.  Aloorithmic  solutions  have  been  found  for  the 
register-allocation  problem  for  simple  st ra i ght - 1 i ne  (non- 
looping)  programs  1521/  but  a  general  solution  is  either  not 
possible  or  not  practical.  The  former  is  usually  the  case 
in  programs  which  contain  conditional  branches/  since  the 
flow  of  execution  of  the  program  is  almost  always  unknown  at 
compile  time/  and  this  information  is  needed  for  an  optimal 
solution.  The  latter  is  usually  true  for  long  programs/ 
even  if  they  contain  no  loops/  since  an  optimal  solution 
would  reguire  an  analysis  of  the  entire  program  and  an 
enumeration   of   all   possible   combinations    of    register 
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assignments.  F rei burghouse  120]  has  recently  presented  a 
method  for  solving  the  reqister-al location  problem  which 
takes  advantage  of  information  which  can  be  accumulated  dur- 
ing the  normal  course  of  compilation  and  which  appears  to 
give  results  closer  to  the  optimum  than  other  proposed  solu- 
t  i  ons. 

As  discussed  in  Section  VI. Cf  parallelism  is  an 
important  feature  in  firmware  systems/  and  the  generation  of 
code  to  take  maximum  advantage  of  this  parallelism  is  anoth- 
er architecture-deoenaent  optimization  problem.  An  impor- 
tant use  of  parallelism  in  improving  execution  of  a  program 
lies  in  the  area  of  reducing  the  time  required  for  iterative 
segments  of  code.  For  example/  the  PL/M  code  of  Figure  21 
could  De  translated  into  more  time-efficient  code  if  several 
arithmetic  units  were  available  than  if  only  one  were  avail- 
able. 


DO  I  =  1  TO  20? 

A  =  ( B  ( I )  +  B  ( I  +  1  )  )  *  2 ; 

C(I)  =  CCI)  +  A; 

D(I)  =  D(I)  -  A; 

end; 


Figure  21.   Iterative  code  for  which 
parallel  processing  would  be  useful 


As  is  usually  the  case  in  optimization  problems/ 
there  are  tradeoffs  which  must  be  made  when  dealing  with 
parallelism.    The   two   methods   shown   in   Figure   22   for 
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Figure  2.2..       Tree  structure  for  serial  and 
parallel  computation  of  an  expression,  (a)  Tree  yielding 
minimum  numoer  of  registers?  (b)  Tree  yielding 
maximum  inherent  parallelism  I52t    p. 2) 
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calculating  an  expression  can  be  used  to  illustrate  this 
point.  If  code  were  being  generated  for  a  machine  with  only 
one  register  the  scheme  in  Figure  22(a)  would  be  better  than 
that  of  Figure  22(d)*  while  the  reverse  would  be  true  for  a 
machine  with  four  multipliers  and  four  registers.  For  a 
machine  with  fewer  than  four  multipliers*  though,  it  is  not 
obvious  which  method  would  be  better.  In  such  situations  an 
analysis  must  be  made  of  the  various  types  of  instructions 
involved.  One  way  to  do  this  would  be  to  assign  weights  to 
the  instructions  based  upon  their  execution  times  (e.g.*  a 
multiplication  instruction  would  have  a  greater  weight  than 
an  addition  instruction)  and  then  generate  the  code  which 
achieved  the  minimum  total  weight  for  the  desireo 
computat  ion. 

3.   Archi  tecture-Independent 

The  final  and  most  general  category  consists  of  the 
architecture-independent  optimizations.  Since  these  do  not 
depend  on  the  architecture  or  the  instruction  set  of  the 
object  machine*  they  are  obviously  applicable  to  compiling 
for  user-definable  architectures.  These  kinds  of  optimiza- 
tions can  be  applied  to  the  intermediate  language  code 
without  considering  the  hardware  features  available.  As  in 
the  case  of  architecture-dependent  optimization*  these  op- 
timizations are  global  in  nature.  The  most  commonly  applied 
technigues  in  this  class  are  common  subexpression  elimina- 
tion* dead  variable  elimination*  code  motion*  and  constant 
propagation.   Since  these  technigues  are    widely  discussed  in 
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the  literature  (see/  e.g.*  14]  for  a  good  survey  and  U3] 
for  an  application)  only  a  brief  description  is  presented 
here . 

Common  subexpression  elimination  is  the  most  widely 
emoloyed  technique  152].  Basicly  it  is  concerned  with 
avoiding  redundant  computations/  such  as  for  the  second 
occurrence  of  "B*C"  in  Figure  23(a).  Dead  variables  are 
those  which,  beyond  a  given  statement*  never  aqain  appear  on 
the  right-hand  side  of  an  assignment  or  are  never  aqain 
referenced.  In  the  first  case  the  variable  need  not  be  kept 
in  a  high-speed  register*  and  in  the  second  case  it  need  not 
any  longer  be  assigned  any  memory  at  all.  Code  motion 
refers  to  the  movement  of  sections  of  code  so  as  to  reduce 
the  execution  time  of  a  program.  For  example*  the  section 
of  code  shown  in  Figure  23(b)  would  be  significantly  im- 
proved if  the  assignment  to  "D"  were  moved  outside  of  the 
looo.  Constant  propagation  is  really  a  special  case  of  code 
motion*  since  calculations  involving  only  known  constants 
are  moved  from  the  execution  phase  of  a  program  to  the  com- 
pilation phase.  The  computations  of  HCM  and  "D"  in  Figure 
23(c)  provide  examples  of  propagated  constants. 

Arch i tec ture- i ndependent  optimization  techniques 
rely  heavily  on  theoretical  work  and  are  amenable  to  the 
application  of  sophisticated  algorithms.  They  usually  in- 
volve a  global  flow  analysis  of  the  intermediate  form  of  the 
program  and  may  rely  on  graph  theory  or  matrix  analysis. 
Unfortunately  most  of  these  techniques  are  very  complicated 
and  require  large  amounts  of  memory  and  time. 
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A 

=    B 

*   c   +   0; 

Q 

=    D 

♦  R; 

X 

=    P 

+  b   *  c; 

(a) 


00  I  =  1  TO 
A(I)  =  3(1) 
D  =  X  *  Y  / 
B(I)  =  C(I) 

END; 


l  o  o  o  ; 

*  C(i); 
z  +  50; 

*  0; 


(b) 


A 

=   3; 

b 

=  c   *   o; 

C 

=   A   +   5; 

D 

=    A    *    C    + 

a; 


(c) 


Figure  23.   Arch i t ec t ure- i ndependent 

optimization  candidates,  (a)  Common  subexpression/ 

(b)  Code  motion/  (c)  Constant  propagation 
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Since  a  useful  program  usually  contains  many 
branches  and  loops  which  make  it  impossible  to  know  at  com- 
pilation time  how  often  (if  ever)  many  sections  of  code  will 
be  executed/  frequency  analysis  is  sometimes  employed  in  the 
optimization  process.  By  assigning  a  relative  frequency  or 
weight  to  each  block  of  code  in  a  program*  the  programmer 
allows  the  optimizer  to  perform  a  Monte  Carlo  simulation  to 
determine  tne  "optimum"  code  sequence  [521.  There  have  even 
been  proposals  [243  to  employ  an  adaptive  optimization  pro- 
cess to  perform  the  optimizations  at  run  time.  In  such  a 
scheme  a  large  portion  of  the  effort  would  be  devoted  to 
optimizing  sections  of  code  which  are  heavily  used,  since 
they  account  for  most  of  the  execution  time.  Such  a  scheme 
probably  would  not  be  practical  for  most  real-time  systems 
unless  the  adaptive  optimization  were  done  during  the 
development  process  and  the  resulting  optimizations  were 
applied  to  the  final  system  in  a  non-adaptive  mode. 

C.   APPLICATION 

From  the  discussion  in  Sections  VII. A  and  VII. B  it  can 
be  seen  that  compiler  optimization  is  a  complex  problem.  A 
good  optimizing  compiler,  in  effect,  attempts  to  match  wits 
with  a  good  assembly  language  programmer.  In  order  to  do 
this  effectively  the  compiler  must  have  a  great  deal  of 
"artificial  intelligence"  built  into  it,  and  this  is  some- 
thing which,  unfortunately,  is  difficult  to  ao.  "Optimiza- 
tions originating  in  the  academic  and  scientific  community 
tend  to  be  global,  while,  until  recently,  manufacturers  have 
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concentrated  on  local  and  machine-dependent  techniques." 
152/  p.U  More  efficient  algorithms  must  be  developed  in 
order  to  allow  the  academic  solutions  to  become  more  useful 
in  practical  compilers.  More  general  and  powerful  tech- 
niques for  handling  local  and  mach i ne-deoenoent  optimiza- 
tions must  also  be  found.  For  the  types  of  systems  under 
consideration  here»  many  of  the  techniques  discussed  above 
are  already  practical/  since  compilation  costs  are  only  a 
small  part  of  total  development  cost. 

A  great  deal  of  care  must  be  exercised  in  the  applica- 
tion of  optimization  techniques  in  code  generation.  Many  of 
the  techniques  involve  reordering  of  arithmetic  operations/ 
and  this  can  lead  to  unexpected  and  often  undesired  results 
(e.g./  from  a  numerical  analysis  point  of  view).  Thus  it 
appears  that  a  great  deal  of  work  remains  to  be  done  in  this 
area.  It  is  evident/  though/  that  as  better  techniques  are 
developed  and  the  cost  of  current  techniques  (in  memory  and 
time)  are  brought  lower/  high-level  languages  will  continue 
to  become  more  attractive. 

Until  some  breakthrough  comes  in  the  artificial  intelli- 
gence area  the  most  practical  techniques  will  probably 
require  the  programmer  to  orovide  some  input  to  the  optimi- 
zation process.  He  might  specify  that  speed  is  most  impor- 
tant for  certain  sections  of  code  and  that  the  amount  of 
memory  utilized  should  be  minimized  for  other  sections.  He 
might  also  specify  the  probabilities  of  certain  branches  in 
the   program   (as   has  been  done  in  some  compilers  since  the 
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1950's  152]).   The  computer  then   will   perform   the   "dirty 
work"  much  more  effectively  than  a  human  writing  in  assembly 


lanquaqe. 


Id2 


VIII.   THE  CONFIGURATION-INDEPENDENT  COMPILER 


The  previous  seven  chapters  have  discussed  features 
availanle  in  current  compilers  and  features  which  appear 
feasible  for  future  compilers.  In  this  chapter  an  attempt 
is  made  to  tie  togetner  some  of  these  ideas  and  discuss  the 
possible  functioning  and  structure  of  a  compiler  for  which  a 
target  machine  and  language  are  not  necessarily  specified 
prior  to  comoilation.  The  level  of  interest  in  developing 
such  a  compiler  is  indicated  by  the  increasing  amount  of 
work  being  done  on  machine-independent  high-level  micropro- 
gramming and  system  programming  languages  1 1 &f 38* 40  ,  47 r 57]  . 

Ramamoorthy  and  Tsuchiya  147)  have  demonstrated  a 
language  which  appears  to  have  many  of  the  desired  features 
and  which  can  Droduce  control  code  for  a  complex  micropro- 
grammable  machine.  Their  SIMPL  (Single  Identity  Micropro- 
gramming Language)  is  intended  to  be  machine-independent; 
however,  it  does  not  appear  that  they  have  yet  addressed  the 
problem  of  specifying  the  machine  organization  to  the  com- 
oiler  in  a  flexible  manner. 

Wilcox  161)  has  looked  at  the  latter  problem  but  has 
based  his  work  on  the  concept  of  a  machine-independent  as- 
sembler. This  assembler  is  to  be  used  for  generating  con- 
trol code  for  digital  systems  built  with  QED  functional 
modules  (56).  Ihe  nature  of  this  problem  is  very  similar  to 
that  considered  by  Ramamoorthy  and  Tsuchiya,  and  there   does 
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not   seem   to   be   any   practical   reason   for  not  extending 
Wilcox's  concept  to  a  machine-independent  compiler. 

A.   THE  IDEAL  COMPILER 

The  truly  ideal  compiler  would  be  one  which  would  acceDt 
an  alQorithm  from  the  programmer  in  a  universal  programming 
language/  select  the  most  appropriate  hardware  for  the  job/ 
and  produce  the  code  for  controlling  the  hardware.  Obvious- 
ly the  compiler  would  reguire  more  input  than  just  a  state- 
ment of  the  algorithm.  It  would  need  to  have  information  on 
what  hardware  was  available  and  the  operational  constraints 
to  be  placed  on  the  resulting  system. 

A  compiler  which  could  function  as  described  above  is  a 
goal  for  which  compiler  designers  can  strive/  but  it  is  one 
which  will  probably  reguire  many  more  years  to  achieve.  The 
reason  for  this  is  the  reason  that  computers  have  not  taken 
over  all  other  engineering  disciplines--there  are  too  many 
subtle  tradeoffs  to  be  made  in  designing  a  system.  The 
relationships  Detween  many  of  the  variables  involved  cannot 
be  guantified/  and  a  great  deal  of  experience  and  intuition 
is  required  to  produce  a  good  design.  A  large  part  of  any 
design  effort  is  concerned  with  optimization  of  some  sort/ 
and/  as  discussed  in  Chapter  VII/  this  involves  the  area 
which  the  computer  scientist  labels  artificial  intelligence. 

Short  of  the  ideal/  the  programmer  (system  designer) 
will  have  to  specify  a  few  possible  hardware  configurations 
alonq  with  the  ootimization  functions  and  constraints.  The 
compiler   will   then   make   some   simple  tradeoffs  among  the 
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various  configurations;  choose  the  "optimal"  oner  ana  pro- 
duce the  optimal  code.  In  a  given  design/  for  example*  the 
compiler  might  decide  that  a  system  with  three  multiplier 
modules  would  be  better  than  one  with  two  or  four  multiplier 
modu 1 es . 

It  will  probably  be  several  years  before  even  this  re- 
duce d  caoability  compiler  can  be  implemented.  Based  upon 
what  appears  feasible  within  the  next  few  years*  the  "ideal" 
compiler  would  be  even  more  restricted.  As  indicated  in 
Section  I. A  this  compiler  would  have  several  inputs.  In 
addition  to  the  algorithm,  the  programmer  would  specify  the 
hardware  configuration*  the  format  of  the  control  code*  and 
some  simple  optimization  information.  A  conceptual  block 
diagram  of  such  a  compiler  is  shown  in  Figure  24 .  In  actual 
practice  it  may  be  difficult  to  divide  the  compiler  into  a 
set  of  neat  boxes  with  definite  flow  of  action*  a  fact  which 
is  suggested  bv  the  dashed  line  in  Figure  24.  In  other 
words*  there  will  probably  be  a  strong  interaction  among  the 
various  sections  of  such  a  compiler. 

It  will  probably  be  especially  difficult  to  distinguish 
the  architecture-dependent  optimization  phase  from  pass  2. 
These  two  phases  relate  fairly  closely  to  the  final  two 
steps  in  a  SIMPL  compi  1  at i on--t he  concurrency  and  timing 
analysis  step  and  the  mi c rooperat i on  timing  optimization 
steo.  (The  first  two  steps  are  syntactic  analysis  and  se- 
mantic analysis*  which  parse  the  source  program  and  break  it 
into  a  series  of   subblocks.).  The   concurrency   and   timing 
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Figure  2  ^ .   Conceptual  diagram  of  a  compiler 
for  user-definable  architectures 
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analysis  step  "...  examines  symbolic  code  in  each  subblock 
to  detect  concurrently  executable  mi c rooperat i ons  and  to 
determine  their  feasible  execution  timing."  [4  7 ,  p. 796] 
Next#  the  microoperation  timing  optimization  step  "...  in- 
troduces complete  machine  dependence....  The  haraware  or- 
ganization and  oDeratina  characteristics  are  defined  by  the 
microinstruction  definition  that  is  represented  internally 
in  the  compiler."  [47,  p. 797] 

B.   INTRODUCING  MACHINE  DEPENDENCE 

Probably  the  mos t  d i f f i cu 1 t  problem  which  will  be  en- 
countered in  designing  a  compiler  for  user-definable  archi- 
tectures will  be  that  of  introducing  machine  dependence. 
Compilers  for  fixed  architectures  have  machine-dependent 
information  scattered  through  all  of  their  phases.  The 
configuration-independent  compiler,  on  the  other  hand,  must 
have  machine-dependent  information  localized  to  as  few  areas 
as  possible  and  must  be  structured  in  such  a  way  as  to  make 
it  as  easy  as  possible  to  change  this  information.  It  is 
because  of  the  fact  that  machine  dependence  has  to  be  intro- 
duced at  some  stage  in  any  practical  compiler  that  the  term 
"compiling  for  user-definaDle  architectures"  has  been  used 
in  this  thesis.  The  technically  inaccurate  term  "machine- 
independent  compiler"  is  often  encountered  and  has  the  same 
meani  ng. 

As  indicated  in  Figure  24,  information  on  optimization, 
machine  organization,  and  instruction  formats  will  be  tabu- 
lated  by   the   compiler.    After   suitable   processing,  the 
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information  will  be  loaded  into  tables  in  much  the  same  way 
tnat  the  information  from  the  algorithm  is  loaded  into  the 
symbol  table.  The  architecture-dependent  optimization  phase 
and  pass  2  will  thus  be  "table-driven";  i.e.*  they  will 
extract  information  from  the  various  tables  and  use  this 
information  to  make  optimization  and  code  generation  deci- 
sions. In  the  sense  that  they  use  information  from  the  sym- 
bol table  to  generate  control  coder  the  second  passes  of  the 
two  current  Intel  PL/M  compilers  for  the  8008  and  the  8080 
microprocessors  can  be  considered  to  be  partially  table- 
dr i  ven . 

Tirrell  (571  has  reported  work  involving  the  use  of  a 
table-driven  comDiler  for  microprogramming.  In  his  com- 
piler* tables  containing  machine-dependent  information  could 
be  loaded  prior  to  compilation  or  could  be  generated  during 
compilation.  One  table  was  used  for  indicating  the  status 
of  the  various  hardware  registers  and  indicators/  while  a 
second  table  was  used  to  store  the  basic  microinstruction 
patterns.  Other  tables  were  used  as  aids  in  optimizing  the 
generated  code.  Most  of  the  optimizations  involved  the 
arrangement  of  elementary  operations  into  efficient  microin- 
struction words  (i.e.*  words  which  take  maximum  advantage  of 
para  1 1 e 1 i  sm) . 

Another  concept  which  deserves  attention  in  the  design 
of  a  compiler  for  user-definable  architectures  is  that  of 
dec i s i on- 1 ogi c  tables  I5r44j.  Initially  conceived  to  re- 
place flow   charts   in   business   programming   aop 1 i cat i ons , 
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decision  tables  have  developed  an  extensive  body  of  theory 
to  enable  their  efficient  use.  A  decision  table  consists  of 
a  grouD  of  alternatives  for  a  given  situation  and  a  set  of 
actions  to  be  taken  for  each  alternative.  In  essence?  this 
technique   results  in  a  tabular  program  rather  than  a  table- 


driven  program. 

In   his   discussion  of  register-transfer  languages/  Bar- 


bacci  concluded  with  some  very  pertinent  remarks. 
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IX.   CONCLUSIONS  AND  RECOMMENDATIONS 

As  digital  large  scale  integrated  circuits  and  function- 
al modules  continue  to  have  a  greater  imoact  on  electronic 
system  design?  the  need  for  improved  software  design  and 
application  will  become  ever  more  critical  in  producing 
reliable/  cost-effective  systems.  Most  of  the  concepts  dis- 
cussed in  this  thesis  have  been  in  existence  for  a  number  of 
years?  but  current  hardware  development  trends  demand  that 
greater  emphasis  be  placed  on  translating  these  concepts 
into  res  1 i  t  i  es . 

One  of  the  key  milestones  in  the  effort  to  provide 
better  tools  for  the  design  of  systems  using  the  new  com- 
ponents will  be  the  development  of  suitable  high-level  pro- 
gramming languages  for  describing  the  algorithm.  There  are 
well  over  100  high-level  languages  available  today*  each 
desinned  to  help  solve  a  particular  problem.  "...  10) ne  may 
guestion  the  need  or  desirability  of  all  these  languages. 
Un  the  other  hand/  for  the  convenience  of  the  user /  he 
should  be  allowed  to  choose  a  language  that  he  is  comfort- 
able with  and  which  best  suits  his  application."  [48/  p. 3) 
The  PL/M  language/  developed  by  Intel  Corporation/  has  been 
successfully  used  by  firmware  designers  and  may  be  able  to 
be  used  as  a  base  for  new/  more  comprehensive  languages. 
Even  if  completely  new  languaoes  are  developed/  they  will 
probably  bear  a  strong  resemblance  to  PL/M. 
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Major  effort  will  have  to  be  directed  toward  the 
development  of  compiler  facilities  which  allow  user  specifi- 
cation  of  the  hardware  aspects  of  his  design.  This  will 
require  the  development  of  a  good  hardware  description 
language/  which  may  or  may  not  be  a  subset  of  the  language 
discussed  above  for  describing  the  algorithm.  In  any  event/ 
the  compiler  will  have  the  capability  of  manipulating  this 
hardware  information  in  such  a  way  as  to  facilitate  the  gen- 
eration of  control  code. 

In  order  for  this  compiler  to  be  accepted  by  system 
designers/  it  will  have  to  generate  "good"  control  code/ 
with  the  specification  of  goodness  being  provided  by  the 
user.  Thus  there  is  a  need  for  the  continued  development  of 
practical  compiler  optimization  techniques.  In  all  of  the 
work  to  be  done/  the  optimization  problems  will  probably  be 
the  most  difficult  to  solve  and  the  most  crucial  for  the 
success  of  the  overall  task. 

The  work  discussed  in  Chapter  IV  has  been  sufficient  to 
indicate  the  feasibility  of  developing  a  high-level  languaae 
for  user-definable  architectures?  however/  there  are  many 
guestions  left  to  be  answered  and  several  important  steps 
whicn  need  to  be  taken.  The  development  of  a  formal  tech- 
nique for  describing  the  semantics  of  a  programming  language 
should  have  a  high  priority  in  this  regard.  Despite  all  of 
the  thoretical  work  which  has  been  done  to  improve  the  syn- 
tax analysis  and  parsing  processes  in  compilers/  very  little 
has   been   done   to  formalize  the  semantic  analysis  and  code 
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generation  processes  [30,601.  The  code  generation  process 
for  the  P  L / M  compiler  is  just  another  translation  (from 
intermediate  language  to  machine  language)  and  might  be  able 
to  be  performed  in  a  manner  similar  to  the  parsing  of  the 
source  language  and  generation  of  intermediate  language 
code.  The  use  of  a  push-down  automaton  for  this  process 
should  be  investigated. 

Error  recovery  during  source  language  parsing  is  another 
area  which  deserves  additional  attention.  It  is  desirable 
to  provide  the  programmer  with  as  much  information  as  possi- 
ble* and  the  method  discussed  in  Section  I V • 8 . 4  is  relative- 
ly simple.  Attention  in  this  area  should  also  be  aevoted 
toward  more  efficient  storage  of  error  messages  in  order  to 
help  minimize  the  size  of  the  compiler.  One  technique  for 
doing  this  would  involve  the  design  of  messaaes  which  can  be 
partitioned  into  a  relatively  small  number  of  common 
phrases.  Detailed  messages  could  then  be  constructed  from 
these  phrases. 

The  next  step  in  continuing  the  work  described  in 
Chapter  IV  should  be  the  design  and  implementation  of  a 
second  pass  for  the  PL/M  compiler.  Several  specific  recom- 
mendations can  be  made  here  for  future  work  in  this  area. 
First*  a  routine  will  have  to  be  written  to  transfer  the 
symbol  table  to  a  disk  file.  This  file/  along  with  the 
intermediate  language  file  and  the  initial  value  filer  would 
then  be  used  as  input  for  the  second  pass.  The  key  to  suc- 
cessful development  of  a  "mac h i ne- i ndependent "   second   pass 
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will  be  the  availability  of  a  suitable  hardware  description 
language  and  comoi  ler,  tohen  they  become  availble*  they 
should  be  tested  by  using  them  to  write  a  description  of  the 
Intel  6008  or  bOttO  microprocessor.  In  order  to  produce  the 
control  code*  routines  will  then  have  to  be  written  to  store 
the  necessary  information  from  this  descMption  in  tables 
and  to  manipulate  these  tables  according  to  the  information 
received  from  pass  1.  Until  a  suitable  hardware  description 
language  is  available*  a  more  conventional  pass  2  could  be 
written*  in  the  C  language*  for  the  PL/M  compiler.  This 
would  provide  a  vehicle  for  testing  various  optimization 
techniques.  Finally*  optimization  inputs  must  be  defined* 
and  the  methods  for  utilizing  them  must  be  developed. 
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APPENDIX  A 
PL/M  INTERMEDIATE  LANGUAGE  CODES 


Prefixes 

ADR  .....  Load  Address  of  Symbol 

LIN  Line  Number  Marker 

LIT  Load  Literal  Value 

QPH    Stack  Operator 

VAL  Symbol:  Load  Value 

Procedure:  Load  Address 


Ope  rat ors 


ADC  Add  wi  th 

ADD Add 

AND Logical 

A  KG Procedur 

AX1  Aux  i  1  i  ar 

AX2  Aux  i  1  i  ar 

AX5  Aux  i  1  i  ar 

BIF  Bui  1  t-In 

CSE  Case  Ind 

C  V  A  Convert 

DAT  Data  Sta 

DEL  Delete 

DIS Di  sabl  e 

DIV  Divide 

DRT  Def  aul  t 

ENA Enable  I 

ENB  Enter  dl 

END End  of  D 

ENP Enter  Pr 

EQL  Test  for 

GEO Test  for 

GTR  Test  for 

HAL  Hal  t 

HIV  Extract 

INC  .....  Incremen 

INX  Subsc  r  i  p 

IOR  Logi  cal 

LEQ Test  for 

LOD Load 

LOV  Ext  rac  t 

LSS Test  for 

MUL  Multiply 

NEG  .....  Negative 

NElO Test  for 


Carry 

And 

e  Argument 

y  1 

y  2 

y  3 

F  u  n  c  t  ion 
ex  Operat  i  on 

to  Address  (Double  Byte) 
r t /P  i  n  i  sh 

I nte  r rupt  s 

Return  (End  of  Procedure) 

n t  er  rupt  s 

oc  k 

o  Gtoud 

ocedure 

Egua  1 

Greater  Than  or  Egual 

Greater  Than 

High  Order  Byte 
t 

t  I ndex 
Inclusive  Or 
Less  Than  or  Egual 

Low  Orde  r  Byte 
Less  Than 


Not  Egual 
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iMOP No  Operation 

f  J  0  T  Logical  Negate 

0R0 Origin 

PRO Procedure  Cal  1 

REM Remainder 

RET  Return 

RTL  Rotate  Left 

RTR  Rotate  Right 

S3C  Subtract  with  Carry 

SFL  Shift  Left 

SFR  Shi  ft  Right 

STD  Store  Destructive 

STO Store 

SUB Subtract 

TRA  Uncona i t i ona 1  Transfer 

TRC  Conditional  Transfer 

XCH  .....  Exchange 

XOR  Logical  Exclusive  Or 
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APPENDIX  B 
PROGRAM  LISTINGS 


FILE:   m.gram 
PL/M  Syntax  and  Semantics 


%term  identifier  number  string 

%  {       /*  declarations  used  by  actions  and  programs  * / 

int  ii/jj;  char  *kk  ,  1 1  / 

n include  "m.def" 
^include  "m.decl" 

%> 

%%       /*  beginning  of  grammar  rules  section  */ 
program:  st at ement 1 i s t   /*  1  */ 


st at ement 1 i s t  :  statement     /*  2  */ 

!      st at ement 1 i st  statement    /*  3  */ 


statement:  basicstatement   /*  H    */ 

=  (nDush  =  0;  > 
!  i  f st  atement    /*  5  */ 

=  {npush  =  0;  > 


basicstatement:     assignment  ';' 


/*  6  */ 
{while  ($1--) 

{if  (fixvtsp]  >  0)  emi t (0PR,XCH); 
else  {set sy C syml oc  (sd)  ) ; 
emit(ADR,getsyno());> 

pop(i); 

if  ($1  >  0)  emi t (OPRr STO); 

else  emi  t (0PR,STD) ;  }  } 

group  ' ; '      /*  7  */ 
proceduredefinition  * ; * 


ret  urns t at emen t  '; 
cal  1 st  a t  ement  '  ;  ' 


1    /*  8  */ 
/*  9  */ 
/*  10  */ 
,  /*     11  */ 

dec  1 arat i onst at ement '; '    /*  12  */ 
•halt'   ';'    /  *  1  3  *  / 
=    {emi t (0PR,HAL)  ;  } 
1 enabl e'  ' ; '      /*  14  */ 

=    {emi t (OPR/ENA) ;  > 
'disaole'  '  ;  '     /*  lb  */ 
=    {emi t (OPR, DIS) ;  > 
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• ;  •    /*  16  */ 

laoeldefinition   basicstatement  / * 

error  ' ; '        /* 
=    (errf i  x()  ;  > 


17  */ 

ERROR 


*/  /*  18  */ 


fstatement:        ifclause  statement    /*  19  */ 
=    <emi t (DEF, spop( ) ) ;  > 

ifclause  truepart  statement 
=    <emi  t  (DEFfSpopO  );  } 
1 abe 1 de f i ni t i on   ifstatement     /*  21     */ 


/*    20    */ 


i  f c 1 ause : 


ruepa  r t : 


qroup: 


'if'  exDression  'then'   /* 
{emit(VAL,spusn(nsymt+)); 
emi t (0PR,TRC) ;  ) 


22    */ 


basicstatement  'else' 
<  i  i  =  spop( ) ; 
emit(VAL,spush(nsym++)); 
emit (OPR,TRA) ; 
emi t (DEF, i i ) ;  > 


/*  21    */ 


grouo 
(exi  tbl k(); 
i  f  ($2  >=  0) 

popd );  > 

swi  tch($l  &  0 
{case  0 : 
emi  t  ( 
case  1 : 
{ 
e 
el  se 
< 
e 
e 
i 
e 
pop(  1 
case  2 : 
i  i  = 
emi  t  ( 
case  3 : 
i  i  = 
kk  = 
emi  t  ( 
emi  t  ( 
while 
{ 
e 
e 
emi  t  ( 
for  ( 
break 


head  ending     /*  24    */ 

{ fl a g ( " identifier  invalid  here"); 


3) 
/ 

OPR, 
if  ( 
emi  t 
mi  t  ( 

/ 
emi  t 
mi  t  ( 
mit  ( 
i  = 
mi  t  ( 
);  b 

/ 

SPOD 

OPR, 
/ 

SPOP 
CSP 

DEF, 
OPR, 

(J  J 
kk  = 
mi  t  ( 
mi  t  ( 
DEF, 
J  J  = 

;  ) 


*  si 
END) 

$1  & 
(VAL 
OPR, 

*  st 
(VAL 
OPR, 
ADR, 

SDOP 

OPR, 
reak 

*  wh 

(); 

TRA) 

*  ca 

(); 

+  (J 
make 
CSE) 
— ) 

-  2; 

VAL, 
OPR, 

i  i  ); 
(il 
} 


mDl e  group  */ 
;  break ; 
04)    /*  stepdef  w/BY  */ 

, soop ( ) ) ; 

TRA);  emi t (DEF,spop() ) ; > 

epdef  w/o  BY  */ 

, ii  =  symfind(sp)); 

INC); 

ii);  emi  t  (0PR,STD)  ,* 

(  )  ;  emi  t ( VAL, spoo( ) ) ; 

TRA);  emi t (DEF, i i ) ; > 

; 

i 1 e  grouD  */ 

emi  t ( VAL, soop ( ) ) ; 

;  emi  t (DEF, i  i ) ;  break; 

se  group  */ 

spop ( ) ;  j  j  =  *1  >>  2} 

j  <<  l); 

two(*kk,*(kk+l )) ); 


maketwo(*kk,*(kk+l))); 
AX2); > 

>>  2)   +  l ;  j  j--; )  soopO; 
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grouphead : 


•do'  ' ; •      /*  25  */ 
lenterblkO;  emi  t  (  OPR  ,  ENB  )  ;  $$  = 


o; 


} 
/* 


2b    */ 


'do'  steodpf ini t ion 
=    (enterblkO;  $$  =  t  +  $2;  > 
'do'  whileclause  '?'     /*  21    */ 

=    {enteral k( )  ;  $$  =  2)     > 
•do'  caseselector  '?'    /*  2$    */ 
=    lenterblkO;  $$  =  3; 

emitlVALf spush(nsym++)); 
emit  (DtFf  soush(nsy!n+  +  )); 

grouohead  statement 
=    {if  ((($$=  $1)  &  03)  ==  3) 

{emit(VAL,ii  =  spopO);  em  i  t  (OPR  ,  TRA  )  ; 
emi  t (DEFrSPush(nsym  +  +) ); 
spushCi  i ) ;  $$  =+  a;>  , 


emi  t  (OPR/ AX1  )  ,* 
soush (nsym++ ) ; 
/*  29    */ 


st epdef i ni t i on:  variable  replace  expression  i t erat i oncont rol 

/*  30  */ 

=   {$$  =  $a;  } 

F 

i t erat i oncont ro 1 :  to  expression  /*  31  */ 
=    {$$  =  Of  emi  t  (0PR,LEQ)  ; 

emi t (VAL, spush ( nsym++ ) ) ;  em i t (OPR , TRC ) ,  > 
!   to  expression  by  expression   /*  32  */ 

=    {emit(VAL,ii  =  symf i nd ( sp ) ) ,  em i t ( OPR, ADD ) ; 
emi t (ADR, i i );  em i t (OPR , STD) ;  &$  =  4; 
emit  (VALrSpopO)  ,*  emit  (OPR,  IRA); 
emi t (DEF, spop( ) ) ;  > 

whileclause:  while  exoression    /*  33  */ 
=    {emi t ( VAL , spush (nsym++) ) ; 
emi t (OPR, TRC);  > 


casese lector: 
proc 


'  case '  express  i  on 


/*  3a  */ 


eduredefinition:  procedurehead  statementlist  ending 

/*  35  */ 
=    (if  ($3  <  0)  f 1 ag(" i dent i f ier  required"); 
el  se  {set  sy  ($1 ) ;  ii  =  getsynoO; 
if  (ii  !=  symfind($3)) 

flag(" incorrect  identifier"); 
pop(1  ),*  > 
exitblkO;  em  it  (  OPR  ,  END)  ; 
emi t (0PR,DKT) ;  em i t ( DEF , spop ( ) ) ;  > 


proc 


/*  36  */ 


edurehead:   procedurename 

=    {procode($$  =  $1);  > 

procedurename  type  ';'    /*  37  */ 
=    {setsy(4i  =  $1);  set prec ( $2 ) ;  procode(Sl);  > 
procedurename  parameter! ist';'   / *  38  * / 

=    (setsy(S$  =  :t>l ) ;  setlen($2);  procode  ( J 1  ) ;  > 
procedurename  parameterlist  tyDe  ' ? '     / *  39  *  / 
=    (setsy(i$  =  SI);  setlen($£); 
setprec(i3);  procode ( S  1  )  ;  } 
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procedu rename  'interrupt'  number  ';' 
=    {procode(Si  =  il);  > 


/*  40  */ 


procedurename :   identifier  ':'   'procedure' 
=    (if  (symloclM)  >=  curlev) 

f 1  a q ( " illegal  procedure  name")! 
fixvlill  =  0;  SS  =  sytoD; 
enter($lrprot  r  0r0);  compression; 
dop(1);  emi t (OPR/ENP); 
enterbl k(  )  ; } 


/*  41  */ 


parameter  1 i st  :   oaramet erhead  identifier  ')'     /*  42  */ 
=    (if  (acnt  >=  63) 

(flagCtoo  many  parameters");  acnt  =  6  2;} 
setsyCbl  )  ; 

fixvt$2]  =  0;  $$  =  f  +  acnt  +  (getlastO  <<  6); 
enter(3>2,undeff0,l);  compress  ($W  acnt); 
dod(1 ); > 


parameterhead:   '('      /*  43  */ 

=    (ik  =  sytoo;  acnt  =  0;  } 
I    paramet e rhead   identifier  ','    /*  44  */ 
=    <$$    =  Si;  acnt++;  fixv(S2]  =  0; 
enter($2,undef ,0, 1 ) ; 
popd  );  > 


endi  ng : 


'end' 
{$$  =  -15  > 


/*  45  */ 


'end'  identifier   /*  46  */ 
=   {$$  =  $2;    } 
1 abe 1 de f i n i t i on  ending   /*  47  */ 
=    {$i  =  $2;  > 


1 abel def i ni t i on:   identifier  ':'     /*  48  */ 
=    (labflag++; 

if  ((ii  =  symloclSU)  >=  curlev) 
{set  sy  ( i  i  )  ; 
i  f  (get len()  ) 

(ii  =  getsizeO  +  finfo  +  l; 
*(symbol  +  ii)  =&  03; 
emit(DLFfgetsynoO); 
} 
else  flagClabel  redec  1  ared"  ) ; 
> 
el  se 

( f  i  x  v  ( $  1 J  =  0  ; 

5>$  =  sytop; 

enter ( $1 , 1 abt ,0, 0) ;  compress ($$, 1 ) ; 

emi  t (DEF , nsym-  1 ) ; 

>    > 


numbe  r 


/*  49.  */ 


<emi t (LIT, Jl);  em  i  t  (  OPR,  ORG  )  ,*  ) 


returnstatement : 


return 


/*  50  */ 
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(emit (LIT, 0);  emi  t  (OPR, RET),*  > 

'return'  expression 
(emit (OPR, RET);  > 


/*  SI  */ 


ca  1 


pot 


got 
i 

i 

dec 
i 

i 

dec 


lstafement:      'call'  variable   /*  52  */ 

=    { set sy ( syml oc  l$2]  ) ;  em i t ( V AL , ge t syno ( )  ) ; 

if  ((H  =  gettypeO)  ==  prot)  emi  t  (OPR  ,  PRO  )  ,* 
else  <if  (ii  ==  cprot)  em i t ( OPR, B  IF ) ; 

else  f 1 ag( " var i ab 1 e  not  a  procedure  name");} 
ooo(l);   } 

ostatement:     goto  identifier    /*  S3  */ 

=    (emi  t  (VAL,symf  ind($2)  )  ,*  emi  t  (OPR,  TRA  )  ;  pop(l),*  > 
goto  number  /*  54  */ 

=    {emit (LIT, S2);  emi t (OPR, TRA ) ;  > 


o: 


•go' 


•to' 
•goto' 


/* 
/* 


55 

56 


*/ 
*/ 


1 arat i onst at ement  :  'declare'  dec  1  a  rat i one  1 ement  /*  57  */ 
dec  1  a  rat i ons t a t emen t  ','  dec  1  a  rat i one  1 ement   /*  5b  */ 


1 arat  i one  1 eme 

=    (set 

if  ( 


nt : 
sv  (cu 
get  t  y 
(ii  = 
while 
{ 
i 


/*  59  */ 


t ypedec 1 arat  i  on 
rlev  -  *cur 1 ev ) ; 
pe ( )  ==  prot  ) 

get len( ) ; 

(i  i--) 
setsy(symbol  +  finfo  +  getsizeO  +  2); 
f  (gettypeO  ==  vart  &&  getorecO  ==  0) 
setprec (Si ) ; 


i  dent  i  f  i  er  ' 
=    { 
/*  enter 
symbo 1  = 
set  name ( 
f  i  x  hcol  1 
set  t  ype ( 
/*  set  t 
set  sym ( ( 
get  v 
set  char ( 
/*  not  e 
s  y  f  i  n  (  )  ; 
poo (2) ; 
i  den t  i  f  i  e  r 
=    (if 


} 

>    > 
1 i  teral 1 y  '  st  ring    /*  60  */ 

a  macro  definition  */ 
sytop; 
$1);     /*  fills  size, name, hcoll  * / 

(); 

mac t  )  ; 

he  macro  definition  size  */ 

ii  =  getsizeO  ♦  finfo), 

arc  (  j  j  =  var  ($3)  )  )  ; 

++iv,jj);    /*  fills  the  macro  definition  */ 

that  last  field  filled  is  at  end  of  entry  */ 


} 

datal i  st  /* 
(symcheck( $1  )  ) 
{f i  xv  ($1J  =  o; 


61  */ 


enter($l,  jj  ? 


pop( 


i  i  =  sytoo;  j  j  =  ($2  >  63) ; 

1 vect  :  svect , 1 ,S2) ; } 
if  (Ijj)  compress ( i i  ,  1  )  ; 
else  (setsv(ii);  fixhcollO;} 
1);  emi t (OPR, DAT);  em i t ( DEF , spop ( ) ) ;  > 


mo 


datalist:  datahead  constant  ')'      /*  62  */ 
=    {44  =  41  +  concede (4?/ dat con) ;   > 

datahead:   'data'  '('    /*  63  */ 

.  =    {44  =  U;  emi  t  (VAl,  spush(nsym-H-)  )  ; 
emi t (OPk,TRA);  emi t (OPR, OAT ) ; 
emi  t (DEF,  nsym) ;  > 
!    datahead  constant  ','    /*  64  */ 

=    {44  =  $1  t  concode  (42,  datcon) ;   > 
, 

t ypedec 1 arat i on :  i dent i f i e rspec i f i ca t i on  type    /*  65  */ 
=    {$$  =  $2;  i  i  =  41  ; 

if  (42  i=  1)  change( vart , 42, 1 , acnt ) ; 
compress($l ,  acnt ) ; > 
!    Soundhead  number  ')'  type    /*  66  */ 

=    {if  (J44)  flag(Mi 1  legal  declaration"); 
44  =  $4;  ii  =  41 ; 

if  (42  >  63)  change( 1 vect ,44, 42, acnt ) ; 
else  {change ( svec t , 44, $2, acnt ) ; 
Compress(41 , acnt ) ; >  > 
I    t ypedec 1  a  rat i on  initial  1  ist  /*  67  */ 
=   {4$  =  41; } 


tyoe:  'byte'   /*  68  */ 

=   { 4  4  =  1 1  =  l ;  > 

•address'  /*  69  */ 

=    {44  =  tt  =  2;> 

'  label  '  /*  70  */ 

=    {44  =  tt  =  o; > 


boundhead:  i dent i f i ersDec i f i cat i on  '(' 

=    {  4  4  =  4  1  ;  > 


/*  71  */ 


identifiersoecification:  variablename    /*  72  */ 

=    {44  =  41;  if  (jj)  acnt  =  1,"  else  acnt  =  0;  } 

!  i dent i f i er 1 i st  variablename  ')'  /*  /3  */ 
=    (if  (acnt++)  44  =  41;  else  44  =  42;} 


ident  i  f  i  erl  i  st :  * ( '      /*  74  */ 

=    {acnt  =  0;} 
!     i dent i f i er 1 i st  variablename 


/*  75  */ 
{if  (acnt++)  44  =  $1;  else  44  =  42,*> 


variablename:  identifier     /*  76  */ 
=    (if  (symcheck(41 ) ) 
{  f  i  x  v  ( 4  1 1  =  0  ; 
44  =  sytop; 
enter($l , vart ,1,1);} 
pop( 1 ) ; } 
!     basedvar i ab 1 e  identifier    /*  77  */ 
=    {if  (fixvli.2)  1=  foundv) 

flaqC'base  not  defined")? 
else  { i  i  =  get  syno ( ) ; 

setsy(il);  set bsyno ( i i ) ;  } 
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4>  $   —   J>  1  ; 

dod( i ) ; > 

Daseovar i an  1 e:  identifier  'cased'    /*  76  */ 
.  =    <if  (symcheck(M)  ) 

{ f  i  xv  till  =  basev; 
$$  =  sytop; 
enter(Sl»vart»  1/  1);} 

pop( i ) ; > 

initiallist:        initialhead  constant  ')*      /*  79  */ 
=    <$$  =  $1  +  concode($2,  tt )  ; 
if  (tt) 

<putw($if &buf 3) ;  setsy(ii); 

put w (get syno( ) f &buf 3) ; >  > 
t 

initialhead:        'initial'  '('     /*  80  */ 
=    (SS  s  Of  if  (!tt) 

f 1 ag( " i ni t i al  not  allowed  here");  > 


initialhead  constant 
{$$  =  $1  +  concode($2,  tt )  ,*   > 


/*  81  */ 


assignment:   variable  replace  expression     /*  82  */ 
=    {$*  =  1?  > 

leftpart  assignment    /*  83  */ 
=    Ui  =  +  +  $2;  > 


epl ace: 
eftpart : 


/*  8a  */ 


variable  '  ,  ' 


/*  85  */ 


/*  87  */ 


xpression:  1 ogi c a  I  express i on    /*  86  */ 
variable  ':'  '  =  '  1 ogi ca 1  express i on 
=    (if  (fixvCSU)  emi  t  (OPRr  XCH)  ; 
else  emi t (ADR, sym find (SI)); 

emi t (0PR,ST0) ;  pop(l);  > 


ogi ca 1  express i on :  1 ogi ca 1 f ac t or     /*  88  */ 

1 ogi ca 1 exoress i on  'or'  1 ogi ca 1 f ac t or    /*  89  */ 

=    (emi t (OPR, IOR) ;  > 
1 ogi ca 1  express i on  'xor*  1 og i ca 1 f ac t or   /*  90  */ 

=    (emi t (OPR, XOR);  > 

ogi ca 1 f ac t or :      logical  secondary  /*  91  */ 

1 ogi c a  1  factor  'and'  1 ogi ca 1  secondary     /*  92  */ 
=    (emi t (OPR, AND) ?  > 

ogi ca 1  secondary :   1 ogi ca 1 pr i mary    /*  93  */ 
'not'  1 ogi c a  1 pr i ma ry    /*  94  */ 
=    (emit (OPR, NOT);  > 

ooicalprimary:  arithmeticexpression     / *  95  *  / 
a r i t hme t i cexoress i on  relation  arithmeticexpression/*  96  */ 
=    <emit(0PR,  $2);  > 
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relation:  = 

=  U$  =  EGL; 

•<•  /*  98  */ 

=  <$$  =  lss; 

• > •  /*  99  */ 

=  <$$  =  gtr; 


/*  97  */ 


i  ^  i  i>,,i 


/*  100  */ 

{$$  =  neq; 
i$$  =  leo; 
<$$  =  geq; 


/*  101  */ 

/*  102  */ 


ar i t hmet i cexpress i on :  term   /*  lOi  */ 

or i t hmet i cexpress i on  '  +  '  term  /*  104  */ 
=    <emi t (OPR,ADD) ;  > 

ar i t hmet i cexpressi on  '-'  term  /*  105  */ 
=    <emi  t  (OPR,SUB)  ,*  > 
ari t hmet i cexpressi on  'plus'  term    /*  106  */ 
=    (emi t (0PR,ADC) ;  > 
ar i t hmet i cexpress i on  'minus'  term  /*  107  */ 

=    (emi t (OPR,SBC) ;  > 
'-'  term   /*  108  */ 

=    <emi t (UPR,NEG) ;  > 

errn:  primary    /*  109  */ 

term  '*'  primary     /*  110  */ 
=    (emit(OPRfMUL);  > 


term  '/'  primary 


/*  111  */ 


=    <emi t CUPRrDIV) ;  > 

term  'mod'  primary   /*  112  */ 

=    <emi t (0PR,REM) ;  } 

primary:  constant    /*  113  */ 

=    (if  Cconlast  ==  stringc) 
{ i  i  =  var  [sd]  +  1  ; 
switch  (il  ) 

(case  1:  emi t (LI  1, get varc ( i i ) ) ;  break; 
def aul t  : 

f 1 ag( "st ri ng  must  be  1  or  2    chars"); 
case  2. : 

emit(LIT/maketwo(getvarc(ii+l)/ 
get  varc ( i  i ) ) ) ;   > 
pop(1 ); 
> 
else  emi  t (LIT*  $1 ) ;   > 
!     ' . ■  constant    /*  1  la  */ 

=    (emi t (VAL, spush ( nsym++ ) ) ;  emi t (OPR, TRA  )  ; 
emi t (OPR, DAT);  emit(DEF,0); 
coocodo($2fpricon) ; 

emit (OPRrDAT);  emi t (DEF , spop ( ) ) ;     > 
I     constanthead  constant  ')'   /*  115  */ 
=    (concode ( id , pr i con ) ; 
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emi  t  (OPR, DAT)  ,*  emi t ( DEF , spop () ) ;     > 
vari  abl  e   /*  1  16  */ 
=    { i  i  =  symfind($l); 
if  (i  i  >=  0) 

{swi  t ch (qet  type ( ) ) 

(case  prot:  em1 t (VALr 1 i )r  emi t (OPR, PRO) ; 

break; 
case  cprot:  em i t ( V AL ,  i i ) ;  emi t (OPR , BI F ) ; 

break ; 
default:  i f ( ! f i x v  [S  1  ]  )  emi t ( V AL, i i ) ; 
el se  emi  t (0PR,L0D); >  > 
pop(l);  i 
. '  vari  abl e     /*  117  */ 

=    {if  (ifixvU2J)  emi t (ADR, symf ind($2) ) ; 
emi t (UPR,CVA) ;  pod(I);  > 
( '  expression  ' ) '   /*  1 18  */ 


constant  head: 


•  i  *  t  % 


/*  1 19  */ 


{emi  t  (VAL,spush(nsym  +  +  ) )  ,*  em  i  t  ( OPR  ,  T  RA  )  ,* 
emi t (OPR, DAT);  emit(D£F,0);  } 

/*  120  */ 
{concode(i2,oricon);     > 


constanthead  constant  ',' 


v  a  r  i  a 
•    s 


ole:   identifier    /*  121  */ 

=    {undecO,*  fixvtSS  =  $11  =  0;  } 
ubscriothead  expression  ')'     /*  122  */ 
=    {ii  =  symfind(il),*  ++fixvl$$  =  $13; 
if  ((jj  =  gettypeO)  i=  prot  &&  jj 

emi  t (OPR, INX) ; 
el  se  emi  t  (OPR,  ARG)  ,*  > 


=  cprot ) 


subsc r i pt head:      identifier  '('    /*  123  */ 

=    lundecC),  fixvlSS  =  $1)  =  0;  ii  =  symfind(Sl); 
if  ((jj  =  gettypeO)  1=  prot  &&  jj  !=  cprot) 
emi t (ADR, i i ) ;  > 


subscripthead  expression  ',' 


{ i  i  =  symf  i  nd ( $1 ) ; 
if  ((jj  =  gettypeO)  ==  prot 
emi t (OPR, ARG);  } 


/*  12a  */ 

j  j  ==  cprot ) 


const  ant : 


string   /*  125  */ 
{$i  =  get varc (var  [ill  ) ;  } 

number     /*  126  */ 
{  S  $  =  $  1  ;    } 


to:  'to'     /*  127  */ 

=    {emit(ADR,ii  =  symf i nd ( sp ) ) ; 

emit(OPR,STD);  emit(DEF,spush(nsym++)); 
emi  t (VAL, i  i ) ;  ) 
i 
by:  'by'     /*  128  */ 

=         (emi  t  (0PR,LEQ) »     ii    =    spopO; 
emi  t (V^L, spush(nsym++) ); 
emi  t  (OPR,  TRC)  ,*    emit(VAL,jj    =    nsymtt); 


iaa 


emitlOPR,TRA);  emit(DEF,spush(nsyTi  +  +)); 
spush(jj);  spush(ii);  > 

9 

whi 1e:  'whi 1  e'   /*  129  */ 

=    iemi t (DEF, spush (nsymf +  ) )  ;  > 

%%       /*  beginning  of  programs  section  */ 
^include  "m.scan.c" 


las 


FILE:   m.def 
Mac  ro  Def  i  ni  t  i ons 


"def  i  ne  t  rue  1 

tt define  false  0 

"define  nuote  39 

"define  do«-forever  while(l) 

"define  un<nown  -128 

"define  EOF  -1 

define  SIGN  0100000 

"define  e  rrc  0 
"define  icientc  1 
"de f  i  ne  numoc  2 
"define  strinqc  3 
"define  spec  1  4 
"def  ine  eof  c  5 
"de  f  i  ne  num8  6 
"de  fine  dat  con  8 
"de  fine  pr i  con  9 
"define  hashmask  1  d?7 

"de  fine  b  i  n  v  2 

"define  octv  8 

"de  f  i  ne  dec  v  10 

"define  h  e  x  v  16 


/*  size  of  symbol  table  is  symsmax  +  1  */ 
"define  symsmax  4096 


"define  maxsyno  1023 

"define  maxlen  16383 

"define  varcmax  127 

"define  stackmax  29 
/  *  maximum  number  of 

"define  macmax  10 

"def  i  ne  maxb Ik  19 


/*  syno  field  is  10  bits  */ 
/*  length  field  is  14  bits  */ 

/*  last  location  in  varc  */ 
/*  top  of  parsing  stacks  */ 

levels  of  macro  nesting  */ 

/*  maximum  block  nesting  level  */ 


"define  foundv  2 
"def  ine  base v  1 


/*  symbol  table  fields  */ 

"def  ine  1 ast  f  0 

"define  t  ype  f  1 

"de  fine  s  i  ze  f  2 

"de  fine  name  f  3 

"de  fine  f  i  n  f o  3 


/*  fixed  info  in  'symbols'  */ 


/*  symbol  table  types  */ 
"de  fine  rest  15 
"define  unde  f  0 
"de  f  i  ne  mac  t  1 
"de  fine  vart  2 
"define  arrt  3 
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tide 
*de 
tide 
tide 
(trie 
tide 
ftcie 
tide 
tide 
tide 

/* 

#de 
»de 
»de 
»de 
#de 
»de 

/* 

#de 
»de 
tfde 
»de 
»de 
#de 
#de 
ttde 
tide 
tide 
tide 
ifde 
tide 
tide 
*de 
«de 
Pde 
tide 
tide 
tide 
tide 
tide 
tide 
tide 
tide 
tide 
tide 
ude 
ffde 
tide 
tide 
tiae 
tide 
tide 


ne  strt  4 
ne  laot  5 
ne  pro t  6 
ne  cvart  7 
ne  cprot  8 
ne  i  vart  9 
ne  out  Dt  10 
ne  1 vec  t  11 
ne  svec  t  12 
ne  clabt  13 

oerators  for 

ine  DEF  0 

ine  ADR  1 

ine  VAL  2 

ine  OPR  3 

ine  L  I  I'm  4 

ine  LIT  b 


Polish  Operators  for  "emit"  * / 


emit"  */ 


ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 
ne 


^JOP 
ADO 
SUri 
ADC 
SbC 
MUL 
DIV 
REM 

NEG 
AND 
IuR 
XOR 
NOT 
EQL 
LSS 
GTR 
NEQ 
LEO 
GEQ 
INX 
TRA 
TRC 
PRO 
RET 
STO 
STD 
XCH 
DEL 
DAT 
LCD 
OIF 
INC 
CSE 
END 


1 

2 

3 
4 
5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

za 

23 

24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
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«dei 

f  i  ne 

ENB 

35 

sdel 

;ine 

ENP 

36 

»dei 

F  i  ne 

HAL 

37 

#dei 

F  i  ne 

RTL 

38 

#del 

Fine 

RTK 

39 

*de1 

F  i  ne 

SFL 

40 

»dd 

F  i  ne 

SFR 

41 

»aei 

F  i  ne 

HIV 

42 

»dei 

F  i  ne 

LOV 

43 

#del 

F  i  ne 

CVA 

44 

*dei 

Fine 

UkG 

45 

tfdei 

F  i  ne 

DR7 

46 

tfdei 

f  i  ne 

ENA 

47 

»dei 

;  i  ne 

DIS 

48 

#dei 

F  i  ne 

AX1 

49 

*dei 

F  i  ne 

AX2 

50 

*dei 

F  i  ne 

AX3 

51 

*del 

F  i  ne 

ARG 

52 
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FILE:   m.decl 
Global  Variable  Declarations 


int  iirjj;  char  *  k  k  ,  1 1 ; 

int  nsvm?    /*  next  symbol  number  */ 

/*  npush  is  no.  of  symools  pushed  for  current  statement  */ 

char  npush ; 

/ *  I  a  b  f 1  a  g  indicates  if  current  statement  has  a  label  *  / 

char  1 abf 1 ag; 

int  acnt ; 

char  con  last? 

int  yyline;  /*  line  count  */ 

int  yydebug;     /*  debug  switch  */ 

/*  comoiler  toggles  */ 

char     togdr  /*  debug  */ 

togp/  /*  production  listing  */ 

togt;  /*  token  listing  */ 

/*  line  limits  for  toggles  */ 

int  liirl/    /  *  lower  limit  */ 

Hmu;    /*  uoper  limit  */ 


char  t  oken,  s t  ype, 

int  value; 

char  errset; 

char  *hent ry  lhashmask  + 


hashcode/  lastC/  nextc? 


i  j ; 


/*   'symbol'  is  the  base  address  of  the  currently  referenced 


symnol   table   entry. 


sytop'  is  the  current  too  of  the 


symbol  taole.  'tokrel'  is  used  to  hold  'symbol'  for 
certain  tokens  during  syntax  analysis^  and  eventually 
makes  it  to  the  'symloc'  stack  corresponding  to  the  ele- 
ment (if  zero/  the  token  was  either  not  looked  up  or  not 
found) .  */ 


char  symoo I s tsymsmax  +  11; 
char  *symbolr*sytoPf*tokrelr 


/*  symbo 1  table  */ 


char  max  sy  # 
s  y  1  a  s  t  ; 

int  sy  re  1  /   /* 
sy  res;   /* 


/*  min(254r  &symbo 1 s  ( symsmaxl  -  symbol  */ 

/*  last  location  filled  during 
symbol  table  construction   */ 
relative  address  of  current  symbol  */ 
first  symbol  location  after  reserved  words  */ 


/*  token  accumulation  */ 

char  varc tva rcmax+ 1] ;    /*  temporary  character  storage  */ 

int  varinde<(    /*  next  free  varc  location  */ 

tokindex*    /*  start  of  accumulator  in  varc  */ 
acclen;      /*  length  of  accumulated  token  */ 

/*  parsing  stacks  */ 
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char  hash  1st ac<maxtl  ]  ;    /*  hash  code  for  entry  */ 

int  f i x v  1st ac kmax  +  1  ]  ,  /*    temporary  use  during  parse  */ 

var [s t ac kmax ♦ 1 J ;   /*  start  location  in  varc  for  entry  */ 
char  * sym 1 oc  1st ac kmax  +  1 ]  ;    /*  symbol  table  location  */ 
char  so;         /*  stack  pointer  */ 
char  *csp;       /*  symbol  number  stack  pointer  */ 

/*  mactop  is  the  current  top  of  the  macro  expansion  stack* 
andr  when  mactop  is  greater  than  zero*  *macaddr(mactop) 
points  to  the  current  symbol  string  being  expanded  in  the 
symbol  table.   the  maclen  table  gives  the  number  of 
characters  remaining  to  expand  at  this  level.  */ 

char  mactop,  *macaddr  [macmax  +  11; 

char  naclen  [macmax  +  13; 

char  macnext  [macmax  +  1J;  /*  holds  'nextc'  for  each  level  * / 

/*  block  keeps  track  of  the  current  symbol  table  top  for 
each  block  level.   the  variable  blklev  points  to  the 
current  Plocx  level  in  block.   the  value  of  curlev  is 
blocktblklevl.   blkv  is  a  stack  used  for  saving  the 
value  of  npush  at  each  level.  */ 

char  *bl ock  [maxb 1 kt 1 ]  , *cur 1 ev; 

char  blkvlmaxblk+1) ; 

char  blklev; 

/*  puf  is  a  structure  used  for  buffering  io  *  / 
struct  buf   < 

int  fildes;  //file  designator 

i  nt  numused; 

char  *nxtfree;   //buffer  pointer 

char  buff[512);  //b\d    byte  buffer 

}   ; 


struct  buf  bufi; 
st  rue t  buf  buf 2} 
struct  buf  Puf  i; 


//buffer  for  "plm.i.l" 
//buffer  for  getc 
//buffer  for  "plm.i.v" 
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F ILt :  m.act.c 
Procedures  Invoked  by  Semantic  Actions 


^include  "m.def" 
ft  include  "m.decl" 


symchec  k  ( i  ) 

char  i ;  i 

if  (synloc  li I  >=  curlev) 
{ set  sy ( sym I oc  [  i  J  )  ; 
if  (gettypeO  !=  undef) 

{ f 1 ag( " vari ab 1 e  redec 1 ared" ) ; 
return  ( j  j  =  t  rue ) ; } 
acnt--;  set  type (vart ) ;  return  (jj 
> 
ret  urn ( j  j  =  t  rue  ) }       ) 


-    false); 


synf  i  nd ( i ) 
char  i ; 
i  f  (i  < 
i  f 


{ 

0)  return(-l ) ; 
(symlocli)  ~  symbols) 
{set  sy ( sym I oc  [  i ]  ) ; 
swi  tch (get  t  ype ( ) ) 

lease  vart:  case 
case  prot : 
case  I vec t 
case  i  vart 

break; 
def aul t : 

flag(" identifier  cannot 
return(-l);  >  > 
el  se 

{ f 1 ag( "undec 1 ared  variable");  return(-l);} 


cvart : 
case  cDrot : 
case  svect: 
case  ou t pt : 


return(getsyno() ) ; 


be  a  variable"); 


} 


emi  t ( a  1 , ad) 

char  a  1  ;  int  ad',  { 

if  (errset)  return; 
switch(al)   { 

case  LIN:  ad    -    (ad    &  017777)  !  OaOOOO;  break; 

case  LIT:  DutcCal  <<  a,&bufl);  break; 

default:  ad    =  (ad    &  007777)  !  (al  <<  12);  > 
putc (high(a2) , &buf 1 ) ; 
putc(low(a2)r &buf 1);     > 

/*  note  that  the  soush  and  spop  routines  operate  on 
'cstack'i  which  is  actually  an  area  at  the  top 
of  the  symbol  table.  * / 

soush ( sn ) 

/*  push  a  symbol  number  onto  cstack  */ 

i  nt  sn;  { 

if  (csp  <=  sytOD  +  1) 

{t f 1 ag( "estaefc  overflow");} 


1S1 


*(--csd)  =  high(sn); 
* ( --c so)  =  low(sn); 
return(sn);  > 

spop( )   { 

/*    pop  a  symbol  number  from  cstack  */ 

char  i ; 

if     (cso    >=    Jisynbo  1  s  [symsmax]  ) 

(  f  1  agC'cst  ack    underflow"); 

return(-l);  } 
i  =  *(csp+t); 
ret  urn CmaKet wot i f *(csp+  + ) ) ) ;     > 

procode ( sy ) 

/*  emit  code  for  a  <PROCEDURE  HEAD>  */ 

i  nt  sy;  { 

emi  t (VALf  spushlnsym  +  t) ) ; 

emi t (OPRrTRA) ; ' 

set  sy ( sy ) ; 

emi  t  (L)tF  ,  get  syno  (  )  )  ?     > 


*/ 


i+  +  ) 

var  (spJ  ) ; 


COncode(Vrt) 

/*  emit  code  for  constants 
int  v;  char  t;   { 
char  i  r  j  r 

if  (conlast  ==  stringc) 
{for  (i  =  l;  i  <=  v; 
{ j  =  get  varc  ( i  + 
if  (t  <  datcon)  putc ( j r &buf 3) ; 
else  emi  t (LIT/ j  )  / 
> 
pop (i);  return(v); 
} 
constant  is  a  number  */ 
( t  >=  datcon) 

{emit(LIlfV);  return((v  >  255  !!  v 
initial  cons  t  ant  */ 
{ 

ret  urn ( 0  )  ; 
out  c  (  vr  &buf 3)  ; 
(conlast  \ -    num8) 
flag(" single  byte  constant  required")? 
ret  urn ( 1 ) ? 
case  2 :  putw(v/&buf3); 
return(2);   >    } 


/* 

i  f 

/* 

switch  ( t ) 

case  0 

case  1 

i  f 


<  0)  +  1);  > 
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FILE:  m.aux .c 
Auxiliary  Procedures 


8  include  "m.def" 
n\  nc  1  ude  "m.dec  1  " 

char  high(n)     /*  return  high  byte  of  an  integer  */ 
i  n  t  n  ;        { 
ret  urn  (n  >>  8 )  ;  } 

char  low(n)      /*  return  low  byte  of  an  integer  */ 

int  n;        { 
return  (n);  > 

int  maket wo ( i  r j ) 

/*  return  16  bit  value  constructed  from  i  and  j  */ 

char  i  f  j  ?    { 

return  ( ( j  <  <  8 )  J  ( i  &  0377));  > 

i  n t  norm ( i ) 

Char  i;       i 

/*    ensures  that  cnars  with  msb  =  1 

are  converted  to  integers  in  the 
range  (126^255)  rather  than  to 
negative  integers  */ 

return  (i  <  0  ?  256  +  i  :  i);     > 


push  ( i  ) 

/*  stactc  the  last  token  in  varc  */ 

char  i ;  { 

npush++ ; 

if  (++sp  >  stackmax) 

<f  1  agCstack  overflow");  so  =  0;  npush  =  0;> 
varlsp]  =  tokindex; 
varc  (tokindex)  =  acclen; 
tokindex  =+  acclen  +  1# 
/*    varc  is  ready  for  another  token  */ 
f  i  x v  [  spj  =  i  ; 
hashfsp]  =  hashcode; 
symloclsp]  =  tokrel;     > 

pop  Cn) 

/*  remove  n  to<ens  from  the  stacks  */ 
char  n;  { 

if  (laoflag)  {n  =+  labflag;  labflag  =  0;} 
for  (;  n  >  0;  n--) 
{ npush-- ; 
if  (sp  <  0) 

{  f 1 ag( "stack  underflow"); 
sp  =  -l;  nDush  =  0;  return;} 
tokindex  =-  va re  I var tsp--l J  +  l; 
>  > 
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FILE:  m.err.c 
Error  Rout  i  nes 


^include  "m.def" 
^include  "m.decl" 

f 1 ag(err) 

char  *err}        < 

orint  f  ("XncoTipi  le  error,  line  %d 

tf lag(err) 

Char  *er  r } 
i1] aq(err) ; 
e  x  i  t  (  )  ;  } 


%s\nH/yyline/err);   > 


er  r  f  i  x  ( )     { 

/*  procedure  for  error  recovery  following 

Discovery  of  a  syntax  error  on  input 
errset  =  t  rue; 
pop ( npush ) ;      } 


*/ 


undec ( )      ( 

/*  check  for  undeclared  variables 
if  (fixvlso]  1=  foundv) 

{ f 1  a g ( " variable  undeclared"); 

enter(9Drundef f 0/ 1 ) ; 

setsy(sytop  -  *sytop); 

synloctsp]     =    symbol; 

f  i  xncol  I ( ) ; 

}  > 


*/ 


isn 


FILE:   m. scan. c 
Lexical  Analysis  Routines 


/*  lexical  analysis  */ 

char  qetvarc(i) 
char  i  ; 
{return  (varclnorm(i)));} 

char  gnc ( ) 
{  /*  get  next  input  char  */ 
char  i;  int  j  ; 
while  (true) 

{if  C((i=getc(&buf2))  !=  *\r*)    && 
(i  1=  '\n')) 
return ( i ) ; 
e  I  se 

iif  (togd) 

{if  (yyline  ==  liml)  yydebuq  =  true; 
if  (yyline  =  =  limu  t  1)  yydebug  =  false; 
> 
emit(LINr++yyline); 
} 
}    > 

char  readi  np( )       { 
char  c  / 

if  (mactop  >  0)      /*  then  expanding  a  macro  */ 
{ i f ( I (--mac  1  en  [mac  top]  ) )     /*  maclen  ==  0  */ 

/ *  then  end  of  macro  expansion:  restore  nextc  *  / 
return  (macnext (--mactop) ); 
/*  otherwise  continue  expansion  */ 
return  (* (t+macaddr  [mactop] )) ; 
> 

/*  otherwise  read  from  input  device  */ 
return  ( qnc ( ) ) ; 

> 


zeroacc ( ) 

{  /*  zero  accum  parameters  */ 
stype  =  hashcoae  =  acclen  =  value  =  0;  ) 

sa ve  r ( ) 

{  / *  save  characters  in  the  accumulator/  and  compute 
the  hashcode  */ 
int  i  ; 

hashcode  =  (hashcode  +  nextc)  &  hashmask; 
if  ((i  =  ++acclen  +  tokindex)  >=  varcmax) 
{f  lagC'vo"); 
ace  1  en  =  0;  > 
else  varcli)  =  nextc;    ) 
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char  nume  r  i  c  C ) 

{  /*  return  true  if  nextc  is  numeric  */ 
return  ( norm ( nex t c- ' 0 •  )  <=  9);  } 

char  hex  ( ) 

{  /*  return  true  if  nextc  is  hexadecimal  */ 

returnCnumericOiiCnormCnextc-'a')  <=  5 )  )  ;  > 

char  letterO 

{  /*  return  true  if  nextc  is  a  letter  */ 
return  ( norm (nex t c - ' a ' )  <=  25);  > 

char  a  1 phanum ( ) 

{  /*  return  true  if  nextc  is  alphanumeric  */ 
return  (numericO  S!  letterO);  } 


gettokenO  {/*  get  tokens  for  the  oarser  */ 
char  b» d>  i  r neg? 
int  v ; 

zeroacc (  )  ; 


<  /*  find  initial  character  * / 
{token  =  0; 
while  ( token  ==  0) 

{  /*  deblank  input  */ 
while  ((nextc==unknown) 
I  !  '  (nextc  =  =,\t  '  )) 
nextc  =  readi  np(); 


') 


/*  check  symbol  class  */ 

if  (letterO)  token  =  identc?  else 
if  (numericO)  t  o  <  e  n  =  numbc?  else 
if  (nextc  ==  quot  e ) 

(token  =  stringcJ  nextc  =  unknown?)  else 
/ *  this  must  be  a  SDecial  char  * / 

(lastc  =  nextc;  saverO;  nextc  =  unknown; 
if  (lastc  =  =V) 

(  /*  may  be  a  comment  */ 
if  ((nextc=readinp())==1*') 

(while  (! (((nextc =readinp()) ==*/') 
&&  (lastc  ==  '*')))  lastc  =  nextc; 
nex  t  c  =  unknown;  zeroaccOM 
else     /*  just  a  /  */  token  =  spec  1 ; > 
el  se 
if  (lastc==EUF)  token=eofc; 
e 1 se  token=spec 1 ; 
i  f  (token  ! =  0)  return; } } 
/*  end  of  checks  for  symbol  class  */) 
/*  end  of  check  for  token  =  0  */  } 


156 


/*  symbol  tyoe  discovered*  scan  remainder  */ 
while  ( t  rue ) 

(if  (nextc  •  -    unknown)  saverO; 
lastc  =  nextc;  nextc  =  readinp(); 

i  f  ( token  =  =  i  dent c ) 

{if  (nextc  ==  '$')  nextc  =  unknown;  else 
if  ( i a  1 phanum ( ) )  return;} 

el  se 

if  (token  ==  numbc ) 

(if  (nextc  ==  '$')  nextc  =  unknown;  else 
if  (IhexOJ  /*  end  of  number  found  */ 

(if  ( (nextc== ' o' ) ! ! (nextc== 'q' ) )  stype=octv; 

el  se 

if  (nextc=='h')  styoe  =  hexv; 

if  (stvpe  >  0)  nextc  =  unknown; 

el  se 

i  f  ( lastc=='b' ) 

{--acclen;  styoe=binv;> 
el  se 
if  dastc  =  ='d') 

{--acclen;  stype=decv;> 
else  st  ype  =  dec  v ; 

/*  now  convert  number  to  binary  */ 
value  =  0;  neg  =  false; 
for  (i=l;  i<= acclen;  i+  +  ) 

{if   ( (d=get varc ( i  +  tokindex))  >=  'a') 
d=d-'a'+10;  el se  d=d-'0' ; 
if((b=stype)  <=  d)  token  =  errc; 
v  =  value;  value  =  d; 
whi le  (b  =>>  1 ) 

{if  (v  &  SIGN)  token  =  errc; 

v  =  <  <  l ; 

if  (b  &  1) 

{i  f  ((value  !  v)  8,  SIGN) 

neq  =  t  rue ; 
value  =+  v; 
i  f  (neg  &&  !  (value  8,  SIGN)) 

token  =  errc; 
} 
> 
) 
/*  binary  equivalent  is  in  'value'  */ 
return; } > 

e  I  se 

if  ( t oken==s t r i nqc ) 

{if  (  nex t c  =  =  quot e) 

{if  ( (nex t c=readi np( ) ) ! =quot e)  return;}} 
} 
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) 


prnt  ok ( )      { 

/*  orint  token  info  */ 

i  n  t  i  ; 

put  char ('  \n'  ) ; 

for  ( i  =  1 ;  i  <=  ace  1  en;  i  +  +  ) 

putchar(varc  (tokindex+ij  ); 
printf (M\nt=%d  s=%d  l=%d  v  =  %l  h=%d"# 

to<en, st  yoe» ace  1  en, va  1 ue/ hashcode) ; 
putchar(  '  \n'  ) ;   > 


yy 1  ex ( )  { 

/*  lexical 

char  i  ; 

t  ok  re  1  =  sym 

ao«-  f  o  re  ve  r 

get  t  oken 

i  f  ( togt 
prnt 

switch  ( 

case  eof 
ret  u 

case  spe 
ret  u 

case  st  r 
push 
conl 
yy  1  v 
ret  u 

case  num 
conl 
yyl  v 
ret  u 

case  i  d  e 
1  ook 
tokr 
i  f  ( 


inalyzer  --  interface  between 
yyparse  and  gettoken 


*/ 


bol 
{ 

o; 

ok( 

tok 

c  : 

rn 

cl  : 

rn 

i  ng 

(0) 

ast 

al 

rn 

oc : 

ast 

al 

rn 

ntc 

up  I 

el 

f  ou 

Sw  i 

cas 


s; 


yyline  >=  liml  &&  yyline  <=  limu) 

); 

en)   { 

C\0'); 

(  1  a  s  t  c  )  ; 

c  : 

} 

-    stringc? 
=  so; 
(st  ring) ; 


(n 

); 


( h i gh ( va 1 ue ) )  ?  numbc 
value; 

umber) ; 


num8; 


nd 

tc 

e 

r 

case 

/ 

/ 


symbo 1 ; 

()) 

h  (qettypeO)   < 

rest  : 

eturn  (getresnoC)  +  2b6); 

mac  t : 

*  start  macro  expansion  */ 

*  save  lookahead  cnaracter  for 
restoration  following  the 
macro  expansion       */ 

acnex t  [mac t op]  =  nextc; 

ex t C  =  unknown; 

f  (++mactop  >  macmax) 

{mactoo  =  0;  f 1 ag ( "md" ) ; > 

*  set  up  definition  */ 

ac 1  en  (mac t opJ  =  getmsizeC)  +  1; 
acaddr  Imac t op)  =  getmdef(); 
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break; 
def aul t  : 

push ( f oundv ) ; 
yy 1 va 1  =  so; 

ret  urn ( i dent i f i er) ;  >  /*  end  of  if(foundO)  */ 
else  if  (ace  1  en  ==  3 

&&  get varc ( tok i ndex+1 )  ==  'e' 
&&  get varc ( t ok i ndext2)  ==  'o* 
&&  get varc (tokindex+3)  ==  'f') 
return  C*\0');  /*  eof  */ 
else  <   /*  unknown  identifier  */ 
push(O); 
yy 1 va 1  =  sp? 
return  (identifier);  > 
/*  end  of  unknown  */ 
break;       /*  end  of  case  identc  */ 
case  errc : 

flag(" number  conversion  error"); 
yy 1 va 1  =  value; 
return  ( number ) ; 
>  /  *  end  of  switch(token)  */ 
}  /*  end  of  do«-forever  */ 
>  / *  end  of  y y 1 e x ( )  * / 


1^9 


FILE:   m. sym.c 
Symbol  Table  Routines 


^include  "m.defM 
^include  "m.decl" 

set  sv ( a) 

char  *a;   < 

i  n  t  i  ; 

/*  set  symbol  to  point  to  symbolsla  -  symbols)  */ 

symbo 1  =  a; 

syrel  =  symbol  -  symbols; 

/*  set  maxsy  so  no  overflow  can  occur 

when  f i  I  1 i nq  the  symbol  table    */ 
if  (hiqh(i  =  csp  -  1  -  symbol)  ==  0) 

maxsy  =  low(i)  &  0376;  else 

maxsy  =  254 ; 
/*  note  that  maxsy  <=  254  */   } 


/*  the  getxxx  procedures  which  follow  assume 
that  symbol  is  set  to  the  base  of  the 
currently  referenced  symbol  table  entry  */ 

char  qetsym(i) 

char  i ;      { 

return  (  symoo 1  (norm ( i ) )  )  ;   } 

char  get  1  as t ( )   { 

/*  qet  the  value  of  the  'last'  field  */ 
return  ( get sym ( 1 ast f  )  )  ;  > 

char  get  t ype (  )  i 

/*    get  the  value  of  the  'type'  field  */ 
return  ( qet sym C t ype f )  &  017);    } 


char  get  s  i  ze  (  )   { 

/*  get  the  value  of  the  'size 
return  ( get sym C s i zef )  )  ;  > 


field  */ 


char  getname(i) 

char  i;      { 

/*  get   character  i  of  the  'name'  field  */ 

return  ( get sym Cnorm ( i )  +  finfo));      } 

i  nt  qet hcol  1 ( )   { 

/*  get  the  hash  collision  field  */ 
char  i  ; 

i  =  qetsizeO  +  finfo  -  2; 
return  (maketwo(getsym(i), 
get  sym( i  +  !)));> 


char  qetresnoC) 


( 
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/ *  get  the  reserved  word  number  * / 
return  ( get sym (get s i ze ( )  +  finfo));  > 

char  getmsizeO      < 

/*  get  macro  size  */ 

return  ( get sym ( ge t s i ze ( )  +  finfo))?  > 

char  *getmr»ef()    ( 

/*  get  the  absolute  address  of 

macro  definition  base  -1  * / 
return  ( norm ( get s i ze () )  +  finfo  ♦  symbol);  > 

i  nt  get syno ( )    { 

/*  get  the  symbol  number  */ 

/*  assumes  10  bit  field  */ 

char  i  ; 

i  =  get  si  ze ( )  +  f  i  nf o; 

return  (maketwo(getsym(i)f  getsym(i+l)  &  03));  } 

char  getprec  C )  i 

return  ( (. ge t sym ( t ypef )  &  0160)  >>  4);  > 

char  get  based ( )      { 

/*  get  the  based  variable  field  */ 
return  ( ge t sym C t ype f )  <  0);  ) 

i  nt  get  bsyno  C )   { 

/*  get  the  bsyno  field  */ 

/ *  assumes  a  10  bit  field  */ 

char  i  ; 

i  =  getsize()  +  finfo  ♦  ?.} 

return  ( maket wo (get sym ( i ) f get sym ( i + 1 )  &  03));    > 

i  nt  get  1  en ( )     { 

/*  get  tne  length  field  */ 

/ *  assumes  a  6  bit  (short)  or  14  bit  (long)  field  */ 

char  i ;  i  nt  1 ; 

i  =  getsize()  +  finfo  +  (getbasedO  ?  3  :  1); 

1  =  norm ( get sym ( i ) )  >>  ii 

return  C(gettypeC)  ==  lvect)  ? 

(norm(get sym( i 1 1 J )  <<  6)  J  1  :  1);  > 


/*  the  setxxx  orocedures  which  follow  assume 
that  symbol  is  set  to  the  base  of  the 
currently  referenced  symbol  table  entry  */ 


set  sym ( i / x ) 

char  i  r  x  ',  { 

if  (norm(sylast 
symbol  [nom(i)J 

set  t  ype ( t ) 

chart;      < 


i)  >  norm(maxsy))  t f 1 ag ( " t o" ) ; 
x;    > 
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set svm( t ypef / (getsym(t ypef )  &  0360)  J  t);  } 

setsi?e(s) 

chars;       < 

set  sym (si  zef  r s) $     > 

sethcol  1  (he) 

int  hcl      { 

char  i ; 

setsvm((i  =  getsizeO  ♦  finfo  -  2)»low(hc)); 

s  e  t  s  y  m  (  i  +  lfhigh(hc))?  > 

set  resno ( i ) 

/*  set  reserved  word  number  */ 

char  i ; 

{set sym ( f i n f o  +  get  si ze( ) r i ) ? } 

set  syno ( )    { 

/*  set  the  symbol  number  field  */ 

/*  assumes  10  bit  field  */ 

cnar  i  ; 

if  (nsym  >  maxsyno)  tflagC'too  many  symbols"); 

setsym((i  =  getsizeO  +  finfo)*  low(nsym)); 

setsymCi  +  1/  high(nsym++)  J  (getsym(i+l)  &  0374)); 

return (nsym  -  1);  } 

set  prec (p) 

/*  set  the  precision  field  */ 

char  p;      { 

setsym(typef  ,  (get  sym  ( t  ypef )  &  0217)  !  (p  <<  4));  > 

set  based (b) 

/ *  set  the  based  variable  field  * / 

char  b;      { 

setsymCtypef , (getsymCtypef )  &  0177)  !  (b  ?  0200  :  0));  ) 

set  osyno ( i ) 

/*  set  the  bsyno  field  of  a  based  variable 

entry  to  the  symbol  numner  of  the  base   */ 
/*  assumes  a  10  bit  field  */ 
i  nt  i  ;   { 
char  j  ; 

setsym((j  =  getsizeO  +  finfo  +  2)*  low(i)); 
setsym(j  +  1,  high(i)  !  (getsym(jfl)  &  0374));   > 

set  1 en( 1 ) 

/ *  set  the  length  field  */ 

/*  assumes  a  14  bit  (long)  field  */ 

int  1  ;   ( 

char  i  ; 

if  (1  >  maxlen)  flagC'vector  length  too  large"); 

/*  based  field  must  have  been  set  already  */ 

l  =  getsizeO  +  finfo  +  (getbasedO  ?  3  :  1); 

setsym(i,  ( 1 ow ( 1 )  <<  2)  !  (getsym(i)  &  03)); 
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s  e  t  s  y  m  (  i  +  1  ,  1  >>  6);  > 

i  nt  found ( )       { 

/*  returns  true  if  symbol  does  not  address 

the  oase  of  the  •symbols'  vector  */ 

return  (syrel);  > 

1 ookup ( )     < 

/*  look  for  accumulator  match  in  symbols 

based  upon  current  value  of  hashcode  */ 
char  i ,     * ] i 


•symbol'  is  set  to  the  top-most  symbol 
with  this  hash  code  */ 

j  :  symbol s)  ; 


/* 


setsy((j  =  hentrythashcodeJ) 


/  *  the  value  of  the  'found'  procedure  is  false 

if  the  symbol  name  cannot  be  found  in  the  table  * / 
while  ( f  ound  ( ) ) 

{/*  'symbol'  points  to  possible  match  in  table  */ 
if  (qetsizeO  ==  acclen  +  2)     /*    then  length  match  */ 
for  Ci  =  0;  qetname(i)  ==  qet va re ( i + I +t ok i ndex ) ; ) 
if  (++i  ==  acclen)  return; 
/*  no  matcrt/  so  look  again  */ 
set sy (get hcol 1 ( )  +  symbols); 
/*  'symbol'  is  now  set  to  the  next  symbol 

with  this  hash  code  */ 

> 
> 


setchar ( s 1 »vl ) 

/*   place  characters  from  varc  into  symbol  table 

starting  at  vl  in  varc  and  si  in  symbol.   the 

length  of  the  transfer  is  obtained  from  varc(vl).  */ 
Char  s 1 / v 1 ;      < 
char  i ; 

i  =  getvarc(vl); 
while  ( i ) 

{setsym(sKgetvarc(  +  +  vl)); 

s  1 1+  ;  i —  ; 

}} 


set  name ( s ) 

/*   set  s  i  ze  r    namer 

from  var  at  s 
chars?       { 
char  k  J 
setsy(sytop) ; 
set s i ze ( get varc ( k  = 
setchar(namefrk); 
/ *  temporarily  store 
sethcol 1  (hash  (sj  ) ;   ) 


and  hco 1 1  fields 

*/ 

v  a  r  [  s )  )  +  2 )  ; 

hashcode  in  hco 1 1 


field  */ 
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i\ xhcol  1  ()   < 

/*  fix  the  hasn  chains  using  the  hashcode 

value  stored  in  the  hcoll  field       */ 
/*  assumes  symool  has  been  set  */ 
char  *p?  int  i? 
sethcoll((p  =  hentryli  =  gethcol.1  ())  )  >  0  ? 

(p  -  symbols)  :  0); 
hentrytil  =  symbol;  } 

sy f  i  n ( )      { 

/*   finish  construction  of  a  symbol  table  entry? 
assuming  the  highest  field  in  the  entry  was 
filled  last  (thus  setting  sylast).  */ 

/*  note  that  sylast  <=  25a  */ 

setsy(sytop  =+  (++sylast)); 

/*  now  addressing  next  symbol  table  entry  */ 

set sym ( 1  as t f r sy I ast  )  ?    } 


enterblkO       < 

/*  enter  a  new  olock  level  */ 
i  f  (+  +  ol kl  ev  >  maxbl k) 

{flag ("bo");  blklev  =  2;> 
Plkvlblklev]  =  npush;  npush  =  0; 
curlev  =  bl oc k Ibl k 1 ev]  =  sytop; 


> 


*/ 

table  entries 

i; 


*/ 


exitblkO        { 

/*  exit  current  block  level 
char  h/j/i;  char  *  p  ; 
/*  remove  innermost  symbol 
i  f  (--bl Kiev  <  1 )  bl klev  = 
set sy ( sy t op ) ;  p  =  sytop; 
while  (p  >  curlev) 

{p  =-  norm (get  1 ast  ())  ; 
set  sy (p)  ; 

/*  entry  removed;  fix  hash  entry*  if  necessary  */ 
if  (i  =  getsize())  /*  >  0  then  recompute  hashcode  */ 
<h  =  o; 
for  (j  =  0;  norm(--i)  >  1;  j++  ) 

h  =  (h  +  getname(j))  &  hashmask; 
hentryth)  =  gethcolH)  +  symbols; 
> 


> 

/ *  remove  any  currently  expanding  macros 
while  (  ( macaddr  [mac  t  op)  >  p)  8>&  mactop  > 

--mactop; 
/*  reset  current  level  */ 
npush  =  blkvlblklev]; 
curlev  =  bl oc k to  1 k 1 ev)  ;      > 


*/ 
0) 


enter(ptr/  t ,     o  r     1) 

/*  ma<e  an  entry  in  the  symbol  table  */ 
charptr^tfp;  int  1;     ( 
set  sy ( sytop) ; 
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set  t  yoe( t ) ; 
setprec (o) ; 
if     (otr    >  a    o ) 

<setbasej(flxv[ptr]  ); 
set  name (pt  r) ;     } 

setsizeCUJ;     > 
setsyno(  )  ; 

let,Cefnu/rr]    ==   ""^    -tb.ynoCOJ, 

changeCt/  p,  j,  n) 

/*  change  the  tvoo   ~ 

int  l,n;     { 

setsy(sytop) ; 

for  (;  n  >  Q;     n.«.j 

sluypl^)?0'  "  "•"<»«>.««)„ 

set  len(  1  ); 
> 

setsy(sytop) ;    > 


compress (pt  r, n) 


/*  remove  the  second  byte  of  tho  i 

from  n  symbol  tahU    .         ,engt 
char  *ptr;  "nt  n;    1   e°tP,eS  Startl" 


c 

] n  t  wjfk;  char 

if  (In)  return; 

setsy(ptr); 


field 

at  ptr  */ 


*p; 


'*  fix  hash  chains  for  f,rst 
f i xhcol 1 ();  bl 


entry  * / 


ot  r 
for 


=  +  f  i  n  f  o  + 

Ci  =  l; 


2); 


i  <nn°r-+Q:)tSi2e(n  +  ^etbasedO  ? 
<setsy(ptrti  ); 

k  =  finfo  t  normCgetsi  z*n  >  +  r 

—  symboMOJ;      yecsi*eU)  +  (getbasedC)  ?  3  :  l); 

f°r  Cj  =  o;  j  <=  k;  j+t) 

Ptr  (j j  =  symbol  [j]  ; 

c^-     ,S    n°W     in    'ts     f'nal 

so    f1x    hashcode    chains 
setsy(ntr); 
f  i  x  h  c  o  I  I  (  )  ; 
D  t  r    =  +    (<    +     j  ; 

PtrlOJ     =    ptrfnj     -    l; 

sytop    =    otr; 
setsy(sytop);  } 


posi t  ion, 
*/ 
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